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TITLE: Improved fusion proteases and methods for producing them 

FIELD OF INVENTION 

5 ind u ^ A * m!Cr0bia " y re,ated Pr ° teases are notab 'y difficutt to produce in 

The present invent.cn proves methods for producing such proteases by expressing them as 

^ul^eal^? 3 het6rO, ° 90US Pr °- 8eqUenCe - The ^nMer prn J^es the 
resulbng proteases compnslng such heterologous prosequences. 

10 * a/ J*". PreS6nt inV6ntl0n re,at8S t0 IS0,ated Pottles having protease activity related to 
10 a Atoc^s sp. protease, and Isolated nucleic acid sequences encoding such p^ats lhe 
mven on furthermore relates to nucleic acid constructs, vectors, and host cells l^Z 'Z se 

STiZT" as wen as methods for p — - — - ~ * ~ 

15 BACKGROUND 

^jTTT' havins ■*■* or proteases, are somedmes atso designated 

exo-^a tat hydropses pepbdes starting at either end thereof, or of .he endo- Ja Z Z 
» ZZUTT* ° ha,nS (end °- e ^)- Endopep.dasas show J^EX^ 

indud..^ ^ >0teaSe ' ' S d6fined he ™' n M ^ «»«» *« hydrolyaes pepflde bonds ft 

rhT^heZ^r 9 * ^ E ° " «»»"» — eacn of ft* ^ 

passes thereof). The EC number refere to Enzyme Noroendatere 1992 (rem NC-IUBMB 

Bioohere. 1 987 . 280. and Eur. J. Biochan.. fggg, 264. 61C-650- respeeUveh, Th. 
http.V/www.onem mmizmi^lsmmatom MfflB ™d <vto/W) a. 

30 areino ad* r^?**" 03 " 0 " ^ 2002/0182672A ' that » one or two of toe las. two 

T 11,6 " "° uld 116 — ■ -»•* « 



Another disclosure reported, that proline residues at the Cterminus of nascent 
polypeptide chains induce degradation of the polypeptide (2002. Prolin residues at the C terminus 
of nascent chains induce SsrA tagging during translation termination. J.BIol.Chem. 277:33825- 
33823). 

SUMMARY OF THE INVENTION 

It is a well-known problem in the art of expressing polypeptides having proteolytic activity 
that many of such polypeptides are inherently unstable, they may be subject to autoproteolysis or 
they may be targeted for degradation by other proteases already during their production, resulting 
in sub-optimal yields. Many other factors may contribute to their instability, not all of which are 
understood at present It is of great interest to provide proteolytic polypeptides with an increased 
stability, that may thus be produced in higher yields. 

Secreted proteases of the S2A and/or S1E classification, often have a pro-region which is 
cleaved off from the protease to produce the mature part of the pmtease. The present inventors 
have found, that production of S2A and/or S1E proteases as fusion polypeptides comprising a 
heterologous pro-region results in much improved yields when compared with production of the 
unaltered wild-type proteases. 

Accordingly, in a first aspect the invention relates to a secreted polypeptide which has 
alpha-lytic endopeptidase activity, which polypeptide comprises a heterologous pro-region, and 
which polypeptide: 

(a) comprises an amino acid sequence which is at least 70% identical. %, or preferably 
75%, 80%, 85%. 86%, 87%, 88%, 89%, 90%. 91%, 92%, 93%. 94%, 95%. 96%. 97% 
98%, or 99% identical to the amino acid sequence of the mature part of the polypeptide 
shown in SEQ ID NO: 28: SEQ ID NO: 33; SEQ ID NO: 47, or SEQ ID NO: 41 

(b) comprises an amino acid sequence which is at least 70% identical. %. or preferably 
75%, 80%, 85%. 86%. 87%. 88%. 89%, 90%, 91<>/ 0 , 92 %, 93%, 94%, 95%. 96% 97% 
98%, or 99% identical to the amino acid sequence of the mature part of the polypeptide 
encoded by the polynucleotide in SEQ ID NO: 1; SEQ ID NO: 2; SEQ ID NO- 25- SEQ 
ID NO: 31; SEQ ID NO: 32; SEQ ID NO: 36; or SEQ ID NO: 40; 

(c) comprises a mature part which is a variant of the mature part of the polypeptide having 
the amino acid sequence of SEQ ID NO: 28; SEQ ID NO: 33; SEQ ID NO: 37; or SEQ 
ID NO: 41, the segment comprising a substitution, deletion, extension, and/or insertion 
of one or more amino acids; 

(d) is encoded by a nucleic acid sequence which hybridizes under very low. low, medium- 
low, medium, medium-high. high, or very high stringency conditions with: 



(I) a polynucleotide encoding a the mature part of a protease, said 
polynucleotide obtainable from genomic DNA from Nocardiopsis alba DSM 
43235 by use of primers SEQ ID NO's: 26 and 27; from Nocardiopsis Alba 
DSM 15647 by use of primers SEQ ID NO's: 35 and 36; from Nocardiopsis 
prasina DSM 15648 by use of primers SEQ ID NO's: 39 and 40; or from 
Nocardiopsis prasina DSM 15649 by use of primers SEQ ID NO's: 43 and 
40; 

(II) the polynucleotide of SEQ ID NO: 1 ; of SEQ ID NO: 2; of SEQ ID NO: 25; of 
SEQ ID NO: 31; of SEQ ID NO: 32; of SEQ ID NO: 36; or of SEQ ID NO: 
40; 

<lll) a subsequence of (I) or (II) of at least 500 nucleotides, preferably 400, 300. 

200, or 100 nucleotides, or 
(IV) a complementary strand of (I). (II), or (III); 

(e) is an allelic variant of (a), (b), (c), or (d); or 

(f) is a fragment of (a), (b). (c), (d), or (f). 

In a second aspect, the invention relates to an isolated polynucleotide encoding a 
polypeptide as defined in the first aspect. 

Still, in a third aspect, the invention relates to a recombinant expression vector or 
polynucleotide construct comprising a polynucleotide as defined in the previous aspect 

Yet a fourth aspect relates to a recombinant host cell comprising a polynucleotide as 
defined in the second aspect, or an expression vector or polynucleotide construct as defined in the 
previous aspect. 

In a fifth aspect, the invention also relates to a transgenic plant, or plant part, comprising a 
polynucleotide as defined in the second aspect, or an expression vector or polynucleotide 
construct as defined in the third aspect. 

The sixth aspect of the invention relates to a transgenic, non-human animal, or products, 
or elements thereof, comprising a polynucleotide as defined in the second aspect, or an 
expression vector or polynucleotide construct as defined in the third aspect. 

The seventh aspect of the invention relates to a method for producing a polypeptide as 
defined in the first aspect . the method comprising: (a) cultivating a recombinant host cell as 
defined in the fourth aspect, or a transgenic plant or animal as defined in the fifth or sixth aspects, 
to produce a supernatant comprising the polypeptide, and optionally (b) recovering the 
polypeptide. 

Other aspects of then invention relate to: an animal feed additive comprising at least one 
polypeptide as defined in the first aspect; and 
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(a) at least one fat-soluble vitamin, and/or 

(b) at least one water-soluble vitamin, and/or 

(c) at least one trace mineral; 

an animal feed composition having a crude protein content of 50 to 800 g/kg and 
comprising at least one polypeptide as defined in the first aspect, or at least one feed additive of 
the previous aspect; 

a composition comprising at least one polypeptide as defined in the first aspect, together 
with at least one other enzyme selected from amongst phytase (EC 3.1.3.8 or 3.1.3.26); xylanase 
(EC 3.2.1.8); galactanase (EC 3.2.1.89); afpha-galactosidase (EC 3.2.1.22); protease (EC 3.4.-.-) 
phospholipase A1 (EC 3.1.1.32); phospholipase A2 (EC 3.1.1.4); lysophospholipase (EC 3.1.1.5); 
phospholipase C (3.1.4.3); phospholipase D (EC 3.1.4.4): and/or beta-glucanase (EC 3.2.1.4 or 
EC 3.2.1.6); 

a method for using at least one polypeptide as defined in the first aspect, for improving the 
nutritional value of an animal feed, for increasing digestible and/or soluble protein in animal diets, 
for increasing the degree of hydrolysis of proteins in animal diets, and/or for the treatment of 
vegetable proteins, the method comprising including the polypeptide^) in animal feed, and/or in a 
composition for use in animal feed; 

a method for using at least one polypeptide as defined in the first aspect, comprising 
including the polypeptide^) in a detergent formulation. 

DETAILED DESCRIPTION OF THE INVENTION 

Proteases are classified on the basis of their catalytic mechanism into the following groups: 
Serine proteases (S), Cysteine proteases (C), Aspartic proteases (A). Metalloproteases <M). and 
Unknown, or as yet unclassified, proteases (U). see Handbook of Proteolytic Enzymes. 
A.J.Barrett. N.D.Rawlings. J.F.Woessner (eds), Academic Press (1998). in particular the general 
introduction part. 

Serine proteases are ubiquitous, being found in viruses, bacteria and eukaryotes; they 
include exopeptidase. endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 
families (denoted S1 - S27) of serine proteases have been identified, these being grouped into 6 
clans denoted SA, SB, SC. SE. SF. and SO. on the basis of structural similarity and functional 
evidence (Barrett et al. 1998. Handbook of proteolytic enzymes). Structures are known for at least 
four of the clans (SA. SB, SC and SE). these appear to be totally unrelated, suggesting at least 
four evolutionary origins of serine peptidases. Alpha-lytic endopeptidases belong to the 
chymotrypisin (SA) clan, within which they have been assigned to subfamily A of the S2 family 
35 (S2A). 
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Another classification system of proteolytic enzymes is based on sequence Information, 
and is therefore used more often in the art of molecular biology; it is described in Rawlings, N.D. et 
al., 2002, MEROPS: The protease database. Nucleic Acids Res. 30:343-346. The MEROPS 
database is freely available electronically at http://www.merops.ac.uk . According to the MEROPS 
system, the proteolytic enzymes classified as S2A in The Handbook of Proteolytic Enzymes', are 
in MEROPS classified as 'S1E' proteases (Rawlings ND, Barrett AJ. (1993) Evolutionary families 
of peptidases, Biochem. J. 290:205-218). 

In particular embodiments, the proteases of the invention and for use according to the 
Invention are selected from the group consisting of. 

(a) proteases belonging to the EC 3.4.-.- enzyme group: 

(b) Serine proteases belonging to the S group of the above Handbook; 
(C1 ) Serine proteases of peptidase family S2A; 

(c2) Serine proteases of peptidase family S1 E as described in BiochemJ. 290:205-218 (1993) 
and in MEROPS a protease database, release 6.20, March 24, 2003. (www.merops.ac.uk). The 
database is described in Rawlings, N.D., O'Brien, E. A. & Barrett, AJ. (2002) MEROPS: the 
protease database. Nucleic Acids Res. 30, 343-346. 

For determining whether a given protease is a Serine protease, and a family S2A protease, 
reference is made to the above Handbook and the principles indicated therein. Such 
determination can be carried out for all types of proteases, be it naturally occurring or wild-type 
proteases; or genetically engineered or synthetic proteases. 

Protease activity can be measured using any assay, in which a substrate is employed, that 
includes peptide bonds relevant for the specificity of the protease in question. Assay-pH and 
assay-temperature are likewise to be adapted to the protease in question. Examples of assay-pH- 
values are pH 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12. Examples of assay-temperatures are 30, 35, 37, 
40, 45, 50, 55. 60, 65. 70, 80, 90, or 95°C. 

Examples of protease substrates are casein, such as Azurine-Crosslinked Casein (AZCL- 
casein). Two protease assays are described in Example 2 herein, either of which can be used to 
determine protease activity. For the purposes of this invention, the so-called pNA Assay is a 
preferred assay. 

There are no limitations on the origin of the protease of the invention and/or for use 
according to the invention. Thus, the term protease includes not only natural or wild-type 
proteases obtained from microorganisms of any genus, but also any mutants, variants, fragments 
etc. thereof exhibiting protease activity, as well as synthetic proteases, such as shuffled 
proteases, and consensus proteases. Such genetically engineered proteases can be prepared as 
is generally known in the art, eg by Site-directed Mutagenesis, by PCR (using a PCR fragment 
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contatnlng the desired mutation as one of the primers in the PCR reactions), or by Random 
Mutagenesis. The preparation of oonsensua proteins is desoribed In eg EP 897g85. The tern, 
obtained from" as used herein In connection with a given source shall mean that the polypeptide 
encoded by the nucleic add s«,uence la produced by the source or by e cell in which me nudeio 

e^ZT S0U,Ce ' S P,BSent ' Pretared emb0diment - ■» P"^P«* - seoeted 

In a specific embodiment, the protease is a low-alleroenlc variant, designed to invoke a 
reduced rmmunologlcal response when exposed to animals, Including man. The tern, 
immunological response is to be understood as any reaction by the immune system of an enima, 
exposed to the protease. One type of immunological response is an allege response leading to 
increased levets of IgE In the exposed animal, l^ltergentc variants may be prepared using 
Mnq*. known in the art. For example the protease may be conjugated with polymer moiej 
shreldrng portrons or epitopes of the protease invohred in an Immunological response. Conjugation 

„TlTr ^ i,WOlVe ""^ Chem ' Cal C0uplin9 «* V**™ «° •» P"*-* •» « described 
,n WO 9*17929. WO m, WO 98*5026. and/or WO 99/00489. Cordon may in adoT 

or alternator thereto Involve ,„ vivo coupling of polymers to the proteose. Such conjugation may 

be adlteved by genetic engineering of the nucleotide sequence encoding the protease. Inserting 

consensus sequences encoding additional glycosylate, sites In the protease and expressing the 

protease ,n a host capable of glycosylating the protease, see e.g. WO 00/26354. Anotter way of 

proving loweikegenic variants is genetic engineering of the nucleotide sequence encoding the 

protease so as to cause the protease to seMgometa. effecting that protease monomere may 

shteld the epdopa, cf other protease monomere and thereby lowering the antigentaty of the 

olrgoroere. Such products end their preparation is described e.g. In WO 96/16177. Epitopes 

involved m en Immunotogical response may be identified by various methods such as the phage 

tamOS??"" * ™° 0006230 and wo ° 1 " 3559 - or,hB random **>">» h **** 

m EP 561907. Once an epitope has been Identified, its amino acid sequence may be altered to 
produce akered Immunological properties of the protease by known gene manipuladon techntoues 
such as s*e d,rected mutagenesis (see e.g. WO 00/26230. WO 00/26354 and/or WO 00/22103) 

The first aspect of the invention relates to a secreted polypeptide which has alphaJytic 

tZS? ■**■ **** pol,pepUde a heteroio9ous prMe9ion ' « •«* 

(a) comprises an amino acid sequence which is at least 70% identical, %, or preferably 
75,4, 80%. 85%. 88%, 87%. 88%. 89%, 90%, 91%. 92%. 93%. 94%. 95%. 96%. 97%. 



98%, or 99% identical to the amino acid sequence of the mature part of the polypeptide 
shown in SEQ ID NO: 28; SEQ ID NO: 33; SEQ ID NO: 47; or SEQ ID NO: 41 

(b) comprises an amino acid sequence which is at least 70% identical, %, or preferably 
75%, 80%. 85%. 86%, 87%, 88%, 89%, 90%, 91%, 92%. 93%. 94%, 95%, 96%, 97%. 
98%, or 99% identical to the amino acid sequence of the mature part of the polypeptide 
encoded by the polynucleotide in SEQ ID NO: 1; SEQ ID NO: 2; SEQ ID NO: 25; SEQ 
ID NO: 31 ; SEQ ID NO: 32; SEQ ID NO: 36; or SEQ ID NO: 40; 

(c) comprises a mature part which is a variant of the mature part of the polypeptide having 
the amino acid sequence of SEQ ID NO: 28; SEQ ID NO: 33; SEQ ID NO: 37; or SEQ 
ID NO: 41. the segment comprising a substitution, deletion, extension, and/or insertion 
of one or more amino acids; 

(d) is encoded by a nucleic acid sequence which hybridizes under very low. low, medium- 
low, medium, medium-high, high, or very high stringency conditions with: 

(I) a polynucleotide encoding a the mature part of a protease, said 
polynucleotide obtainable from genomic DNA from Nocardiopsis alba DSM 
43235 by use of primers SEQ ID NO's: 26 and 27; from Nocardiopsis Alba 
DSM 15647 by use of primers SEQ ID NO's: 35 and 36; from Nocardiopsis 
prasina DSM 15648 by use of primers SEQ ID NO's: 39 and 40; or from 
Nocardiopsis prasina DSM 15649 by use of primers SEQ ID NO's: 43 and 
40; 

(II) the polynucleotide of SEQ ID NO: 1 ; of SEQ ID NO: 2; of SEQ ID NO: 25; of 
SEQ ID NO: 31; of SEQ ID NO: 32; of SEQ ID NO: 36; or of SEQ ID NO: 
40; 

(III) a subsequence of (I) or (II) of at least 500 nucleotides, preferably 400, 300, 
200, or 100 nucleotides, or 

(IV) a complementary strand of (I), (II), or (III); 

(e) is an allelic variant of (a), (b), (c), or (d); or 

(f) is a fragment of (a), (b). (c). (d). or (f). 

For the purposes of the present invention, the degree of identity between two amino acid 
sequences, as well as the degree of identity between two nucleotide sequences, is determined by 
the program "align'' which is a NeedlemarvWunsch alignment (i.e. a global alignment). The 
program is used for alignment of polypeptide, as well as nucleotide sequences. The default 
scoring matrix BLOSUM50 is used for polypeptide alignments, and the default identity matrix is 
used for nucleotide alignments. The penalty for the first residue of a gap is -12 for polypeptides 
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and -16 for nucleotides. The penalties for further residues of a gap are -2 for polypeptides and -4 
for nucleotide. 

"Align" is part of the FASTA package version v20u6 (see W. R. Pearson and D. J. Lipman 
(1988), "Improved Tools for Biological Sequence Analysis", PNAS 85:2444-2448 and W R 
Pearson (1990) "Rapid and Sensitive Sequence Comparison with FASTP and FASTA," Methods 
.n Enzymology 183:63-98). FASTA protein alignments use the Smith-Waterman algorithm with no 
limrtation on gap size (see "Smith-Waterman algorithm", T. F. Smith and M. S. Waterman (1981) 
J. Mol. Biol. 147:195-197). 

The degree of identity between two amino acid sequences may also be determined by the 
Clustal method (Higgins, 1989, CABIOS 5: 151-153) using the LASERGENE™ MEGALIGN™ 
software (DNASTAR, Inc. Madison, Wl) with an identity table and the following multiple alignment 
parameters: Gap penalty of 10. and gap length penalty of 10. Pairwise alignment parameters are 
Ktuple=1, gap penalty^, windows=5, and diagonals**. The degree of identity between two 
nucleotide sequences may be determined using the same algorithm and software package as 
described above with the following settings: Gap penalty of 10, and gap length penalty of 10 
Pairwise alignment parameters are Ktuple=3, gap penalty=3 and windows=20. 

A fragment of one of the encoding polynucleotide sequences of the invention is a 
polynucleotide which encodes a polypeptide having one or more amino acids deleted from the 
ammo and/or carboxyl terminus compared to the full-length amino acid sequence. In one 
embodiment a fragment encodes at least 75 amino acid residues, or at least 100 amino acid 
residues, or at least 125 amino acid residues, or at least 150 amino acid residues, or at least 160 
amino acid residues, or at least 165 amino acid residues, or at least 170 amino acid residues or 
at least 175 amino acid residues. 

An allelic variant denotes any of two or more alternative forms of a gene occupying the 
same chromosomal locus. Allelic variation arises naturally through mutation, and may result in 
polymorphism within populations. Gene mutations can be silent (no change in the encoded 
polypeptide) or may encode polypeptides having altered amino acid sequences. An allelic variant 
of a polypeptide is a polypeptide encoded by an allelic variant of a gene. 

The present invention also relates to isolated polypeptides having protease activity and 
wh.ch are encoded by nucleic acid sequences which hybridize under very low. or low, or low- 
medium, medium, medium-high, high, or very high stringency conditions with a nucleic acid probe 
which hybrids under the same conditions with (I) a polynucleotide encoding the mature part of a 
protease obtainable from genomic DMA from Nocardiopsis alba DSM 43235 by use of primers 
SEQ ID NO's: 26 and 27; from Nocardiopsis Alba DSM 15647 by use of primers SEQ ID NO's: 35 
and 36; from Nocardiopsis prasina DSM 15648 by use of primers SEQ ID NO's: 39 and 40- or 
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from Nocardiopsis prasina DSM 15649 by use of primers SEQ ID NO's: 43 and 40; (II) the 
polynucleotide of SEQ ID NO: 1 ; of SEQ ID NO: 2; of SEQ ID NO: 25; of SEQ ID NO: 31 ; of SEQ 
ID NO: 32; of SEQ ID NO: 36; or of SEQ ID NO: 40; (III) a subsequence of (I) or (II) of at least 
500 nucleotides, preferably 400. 300, 200, or 100 nucleotides, or (IV) a complementary strand of 
(I). (ID. or (III) (J. Sambrook. E.F. Fritsch, and T. Maniatis. 1989. Molecular Cloning, A Laboratory 
Manual, 2nd edition. Cold Spring Harbor. New York). In one particular embodiment the nucleic 
acid probe is selected from amongst the nucleic acid sequences of (a), (b). or (c) above. A 
polynucleotide corresponding to the mature peptide encoding part of SEQ ID NO: 1. SEQ ID NO 
25. SEQ ID NO: 31 . SEQ ID NO: 32. SEQ ID NO: 36. or SEQ ID NO: 40 is a preferred probe. 

The nucleic acid sequences of SEQ ID NO: 1. SEQ ID NO: 25, SEQ ID NO: 31. SEQ ID 
NO: 32. SEQ ID NO: 36. or SEQ ID NO: 40, or a subsequence thereof, as well as the amino acid 
sequences of SEQ ID NO: 28; SEQ ID NO: 30; SEQ ID NO: 33; SEQ ID NO: 37, or SEQ ID NO: 
41, or a fragment thereof, and even a genomic polynucleotide encoding a protease obtainable 
from genomic DNA from Nocardiopsis alba DSM 43235 by use of primers SEQ ID NO's: 26 and 
27; from Nocardiopsis Alba DSM 15647 by use of primers SEQ ID NO's: 35 and 36- from 
Nocardiopsis prasina DSM 15648 by use of primers SEQ ID NO's: 39 and 40; or from 
Nocardiopsis prasina DSM 15649 by use of primers SEQ ID NO's: 43 and 40. or a subsequence 
thereof, may be used to design a nucleic acid probe to identify and clone DNA encoding 
polypeptides having protease activity from strains of different genera or species according to 
methods well known in the art In particular, such probes can be used for hybridization with the 
genomic or cDNA of the genus or species of interest, following standard Southern blotting 
procedures, in order to identify and isolate the corresponding gene therein. Such probes can be 
considerably shorter than the entire sequence, but should be at least 15. preferably at least 25 
and more preferably at least 35 nucleotides in length. Longer probes can also be used. Both DNA 
and RNA probes can be used. The probes are typically labeled for detecting the corresponding 
gene (for example, with »P, *H, "S. biotin. or avidin). Such probes are encompassed by the 
present invention. 

Thus, a genomic DNA or cDNA library prepared from such other organisms may be 
screened for DNA that hybridizes with the probes described above and which encodes a 
polypeptide having protease activity. Genomic or other DNA from such other organisms may be 
separated by agarose or polyacrylamide gel electrophoresis, or other separation techniques. DNA 
from the libraries or the separated DNA may be transferred to and immobilized on nitrocellulose or 
other suitable carrier material. In order to identify a done or DNA which is homologous with SEQ 
ID NO: 1 or a subsequence thereof, the earner material is used in a Southern blot. For purposes of 
the present invention, hybridization indicates that the nucleic acid sequence hybridizes to a 
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labeled nucleic acid probe corresponding to the nucleic acid sequence shown in SEQ ID NO* 1 its 
complementary strand, or a subsequence thereof, under very low to very high stringency 
conditions. Molecules to which the nucleic acid probe hybridizes under these conditions are 
detected using X-ray film. 

For long probes of at least 100 nucleotides in length, very low to very high stringency 
conditions are defined as prehybridization and hybridization at 42°C in 5X SSPE, 0.3% SDS, 200 
Mg/ml sheared and denatured salmon sperm DNA, and either 25% formamide for very low and low 
stringencies. 35% formamide for medium and medium-high stringencies, or 50% formamide for 
high and very high stringencies, following standard Southern blotting procedures. 

For long probes of at least 100 nucleotides in length, the carrier material is finally washed 
three times each for 15 minutes using 0.2 x SSC, 0.2% SDS. 20% formamide preferably at least at 
45-C (very low stringency), more preferably at least at 50°C (low stringency), more preferably at 
least at 55°C (medium stringency), more preferably at least at 60°C (medium-high stringency) 
even more preferably at least at 65"C (high stringency), and most preferably at least at 70X (very 
high stringency). 

For short probes about 15 nucleotides to about 70 nucleotides in length, stringency 
conditions are defined as prehybridization, hybridization, and washing post-hybridization at 5«"C to 
10-C below the calculated T m using the calculation according to Bolton and McCarthy (1962 
Proceedings of the National Academy of Sciences USA 48:1390) in 0.9 M NaCI. 0.09 M Tris-Hc! 
PH 7.6. 6 mM EDTA. 0.5% NP-40, 1X Denhardfs solution, 1 mM sodium pyrophosphate, 1 mM 
sod.um monobasic phosphate, 0.1 mM ATP, and 0.2 mg of yeast RIMA per ml following standard 
Southern blotting procedures. 

For short probes about 15 nucleotides to about 70 nucleotides in length, the carrier 
material is washed once in 6X SSC plus 0.1% SDS for 15 minutes and twice each for 15 minutes 
using 6X SSC at 5°C to 10°C below the calculated T M . 

The present invention also relates to variants of the polypeptide of the invention 
comprising a substitution, deletion, and/or insertion of one or more amino acids. 

In a particular embodiment, amino acid changes are of a minor nature, that is conservative 
amino add substitutions that do not significantly affect the folding and/or activity of the protein- 
small deletions, typically of one to about 30 amino acids; small amino- or carboxyl-terminal 
extensions, such as an amino-terminal methionine residue; a small peptide of up to about 20-25 
residues; or a small extension that facilitates purification by changing net charge or another 
function, such as a poly-histidine tract, an antigenic epitope or a binding domain. 

Examples of conservative substitutions are within the group of basic amino acids (arginine. 
lysine and histidine), acidic amino acids (glutamic acid and aspartic acid), polar amino acids 
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(glutamine and asparagine), hydrophobic amino acids (leucine, isoleucine and valine), aromatic 
ammo acids (phenylalanine, tryptophan and tyrosine), and small amino acids (glycine, alanine, 
senne. threon.ne and methionine). Amino acid substitutions which do not generally alter the 
specific activity are known in the art and are described, for example, by H. Neurath and R.L. Hill 

i « v *' ACademfc ^ N6W YOrk ' ThB m0St Common, y occurri "9 ^changes are 
AUtfSer. Val/He. Asp/Glu. Thr/Ser. Ala/Gly. Ala/Thr, Ser/Asn, AlaA/al. Ser/Gly. Tyr/Phe. A,a/Pro 
Lys/Arg. Asp/Asn. Leu/lle. Leu/Val, Ala/Glu. and Asp/Gly as well as these in reverse 

In a particular embodiment, the polypeptides of the invention and for use according to the 
mventlon are acid-stable. For the present purposes, the term acid-stable means that the residual 
activity after 2 hours of incubation at pH 3.0 and 37°C, is at least 50%. as compared to the 
residual activity of a corresponding sample incubated for 2 hours at pH g.O and 5°C. In a particular 
embodiment, the residual activity is at least 60%. 70%. 80% or at least 00%. 

In particular embodiments, the polypeptide of the invention is i) a bacterial protease- ii) a 
protease of the phylum Actinobactena; Hi) 0 f the class Actinobacteria; iv) of the order 
Actinomyces v) of the family Nocardiopsaoeae; vi) of the genus Nocardiopsis; and/or a 
protease derived from vll) Nocardlopsis species such as /vocals a/oa. Nocardiopsls 

T afdi0PSlS PmS,m ' C ° mPOSta ' hat ° Phila ' M™*™* Kunsanensis, 

listen, lu centens,s, metalticus, synnamataformans, trebalosi. tropica, umidischolae. xinjiangensis 
or Nocard.opsis dassonMei, for example Nocardiopsls dassonvMei DSM 43235 

Gamt^ ^VT^ 18 aCC<>rdin9 *° *" ChaPt6r: The r ° ad ma P t<> the Manual by G.M. 
Canity & J. G. Holt in Sergey's Manual of Systematic Bacteriolcgy. 2001. second edition, volume 
1 . David R. Bone. Richard W. Castenhoiz. 

It will be understood that for the aforementioned species, the invention encompasses both 
he perfect and imperfect states, and other taxonomic equivalents. e.g., anamorphs. regardless of 

T TT ^ ^ WhlCh ^ ^ kn ° Wn " 11,086 ski,,ed in the art will readi,y recognize the 
Identity of appropriate equivalents. 

Strains of these species are readily accessible to the public in a number of culture 
colons, such as the American Type Culture Collection (ATCC). Deutsche Sammlung von 
Mikroorganismen und Zellkulturen GmbH (DSM). Centraalbureau Voor Schimmelcultures (CBS) 
and Agnc.«ura, Research Service Patent Culture Collection. Northern Regiona, Research Center 
dTm ? rn ; ? PS/S daSSOnvl " eisubs P- d*ssonvMei DSM 43235 is publicly available from 
G~T l« Samm,Ung ^ M,kro ^ anfemen und zellkulturen GmbH. Braunschweig. 

i zt:z also deposited at other deposrtary insttutions as fo,,ows: atcc 2321 * 
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Furthermore, such polypeptides may be identified end obtained from other sourcee 

5 L , 1 Se,UenC8 ma " he '"» **» If litany soreenlng a genomic or 

naTJ J IT" mlCr0OT3antem ' 0nM 8 —* «* ^-ca encoding a ^pepl-* 
has bean detected with the probes,. the sequence mey be isoialad or cloid by" ten! 
•echoes which are Known to ,„„se o, on*nary « ,„ me „ (Me . aff ., Sambrook 

10 omer oCf^r herei " - " l8 °' ate< " P ° lyPeP,ide ' S " P ° l>PepWe 18 «-"■* **> * 

omer polypepbdes. e. 9 „ af least about 20% pure, preferably at least about 40% puns mens 

preferebly about 60% puns, even n»re preferably about eo% w most ^ 
pure, and a»an most preferably about 95% pure, as determined by SDS-PAGE 

fused " nUC,6 ' C SeqUBnCeS * °» ' "~ — '"Code 

teed polypapbdes or deavable fusion pdypepBdes in which another polypeptide Is fused at the 
« N„s or me Cterminus of me poHypapbde or hegmen, thereof. A hied 

a n c^l 9 " T ^ ^ ( ° r 3 — » -other pl^pTe m 

L ' 8qUenC6 (0r 8 P ° rti0n »— • * * 9 P-~* h~*a Techniques I JL 

Another preferred embodiment relates to a polypeptide of the Oral aspect, which nature 
tome^t T* " 3 ^ - one or La ammo-a^ 

25 add* ^ '"' UnC,,a ' 9ad; m0re Prefera '* «» one or more added amino 

25 acid(s)isoneormoreofQ, s, V,A, or P. 

«. • T.1 PrefeTOd embooimen ' re,ates> *» " P<"yPeP»de of the first aspect, which 
oflheC-terminus of the polypeptide; 

30 or J"!! P ^ e ™ d ^""n 1 re "*» «o a polypeptide of the Are, aspect, wherein the one 
30 or molded ammo hem me group coring of: QSHVQSAP. QS AP, « 

' QL ' TP ' T| . «. QP. PI. LT. TQ, IT, OQ, and PQ 

even JtZlZ?'!!*"** "** <»>"»<><*»* « *• P™*« Invention were produced in 
A^T t ^ ,h8y 6XPreSS6d « Pio-Pfofeasea fused to a heterologous 

.5 aspect tZT ' PrefeTO ° en,b0a1man ' ^ to "* P-W*** accoming to ma fire, 
aspect wh.cn comprises e heterologous secredon slgnehtepUde which la deal from me 
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polypeptide when the polypeptide is secreted, preferably the heterologous secretion signal peptide 
.3 denved from a heterologous protease, preferably the heterologous secretion signal peptide 
comprises an amino acid sequence having a sequence Identity of at least 70%, or preferably 75% 
80% 85%, 86%, 87%, 88%, 89%, 90% 91%. 92%, 93%, 94%, 95%, 98%. 97% 98%. or 99% 
with the amino acid sequence encoded by polynucleotides 1 - 81 of SEQ ID NO- 2 

Accordingly, a preferred embodiment relates to the polypeptide according to the first 
aspect which comprises a heteroiogous pro-region from a protease; preferably the pro-region is 
denved from an S2A or S1E protease, and most preferably it is at least 70% identical or 

9 P 8 6 % o Jvl T\7t 86% * 87% ' "* 89% ' 9 ° % ' 91% ' 92% ' 93% - 95% 96%, 97%, 
98 ^, or 99%, identical to the pro-region shown in SEQ ID NO: 30. 

Nucleic Aeiri gggyeQggs 

The present invention also relate to tested nucleic acid sequences that enccde a 
potoep** of tm present Invention. Particular nucleic acid sequences of the invention are the 
^leohdee of SEQ ID NO: 1, SEQ ID NO: 2, SEQ tD NO: 25. SEQ ID NO: 31. SEQ ,D NO- 
3*SEQ ID NO. 36. end SEQ ID NO: 40. Another particular nudelc acid sequence of the Invention 
tethe sequence preferabl, the mature pcrfypepfide encoding region thereof. which is obtainable 
ftom genom-c DNA from ftots-dlppsfs dassonWW subspecies dassonv/ffe/ DSM 43235 The 
present .nvention also encompasses nucleic add sequences which encode a polypeptide having 
me amino actd sequence of amino adds 1 to 188. or -166 ,o ,88, of SEQ iD NO: 2. which diffe, 
*om me comesponding parts of SEQ ID NO: 1 by virtue of the degeneracy of the genetic code. 
Thepresen, .nvemton also relates to subsequence, of of the ebove polynudeo«des which encode 
polypepltde fragments that have protease activity. 

A s ^enceofapolynudaotidel. a nudeic edd sequence from which one or more 

lel^ud"! 5 ' 3 ' "* hM ^ 1 *• « «— a, 

*" m0,e " *"* * lM * 300 nUC,eo,ides - -*» * "east 375, 

450 500. 531, 800. 700. 800. 800 or 1000 nudeotides. The present invention L relate to 

^ S8qUenCeS WhiCh ^ " de9,Be °» *> me Polynucleotide of SEQ ID NO* 1 

no Jn £,* ' D * SEQ ,D N ° : 3 '' SEQ » NO: * SE ° » N« 38. and SEQ ,D 
NO: 40 C at leas, 85%. 86. 87. 88, 89. 90. 91, 92. 93. 94. 95. 96. 97. 98, or at teas, 99%. 

The techniques used to isolate or done a nudelc add sequence encoding a polypeptide 

Z^Znl T-r "** i80,a " 0n ^ 9en0m,C DNA - •»» or a 

r^LT^ - C,0ni " 9 ° ,me nUClel ° add " q " ne " *■» ^ *""*» <™ ~* 
anZL J£ 7 ■ * ^ * -« «■"«» Po-yrnerase chain reaction (PCR) or 

anbbody screening of expression librartes to dated doned DNA fragments w«h shared Laurel 
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features. See, e.g.. Innis et a/., 1990, PCFC A Guide to Methods and Application, Academic Press, 
New York. Other nucleic acid amplification procedures such as ligase chain reaction (LCR), 
Hgated activated transcription (LAT) and nucleic acid sequence-based amplification (NASBA) may 
be used. The nucleic acid sequence may be cloned from a strain of Nocardiopsis or another or 
related organism and thus, for example, may be an allelic or species variant of the polypeptide 
encoding region of the nucleic acid sequence. 

The term "isolated nucleic acid sequence" as used herein refers to a nucleic acid sequence 
which is essentially free of other nucleic acid sequences, e.g., at least about 20% pure, preferably 
at least about 40% pure, more preferably at least about 60% pure, even more preferably at least 
about 80% pure, and most preferably at least about 90% pure as determined by agarose 
electrophoresis. For example, an isolated nucleic acid sequence can be obtained by standard 
cloning procedures used in genetic engineering to relocate the nucleic acid sequence from its 
natural location to a different site where it will be reproduced. The cloning procedures may involve 
excision and isolation of a desired nucleic acid fragment comprising the nucleic acid sequence 
encoding the polypeptide, insertion of the fragment into a vector molecule, and incorporation of 
the recombinant vector into a host cell where multiple copies or clones of the nucleic acid 
sequence will be replicated. The nucleic acid sequence may be of genomic, cDNA, RNA, 
semisynthetic, synthetic origin, or any combinations thereof. 

Modification of a nucleic acid sequence encoding a polypeptide of the present invention 
may be necessary for the synthesis of polypeptides substantially similar to the polypeptide. The 
term "substantially similar" to ttie polypeptide refers to non-naturally occurring forms of the 
polypeptide. These polypeptides may differ in some engineered way from the polypeptide isolated 
from its native source, e.g.. variants that differ in specific activity, thermostability, pH optimum, 
allergenicity. or the like. The variant sequence may be constructed on the basis of the nucleic acid 
sequence presented as the polypeptide encoding part of the polynucleotides of the invention, e.g. 
a subsequence thereof, and/or by introduction of nucleotide substitutions which do not give rise to 
another amino acid sequence of the polypeptide encoded by the nucleic acid sequence, but which 
correspond to the codon usage of the host organism intended for production of the protease, or by 
introduction of nucleotide substitutions which may give rise to a different amino acid sequence. 
For a general description of nucleotide substituBon. see, e.g., Ford et a/.. 1991, Protein 
Expression and Purification 2: 95-107. Low-allergenic polypeptides can e.g. be prepared as 
described above. 

It will be apparent to those skilled in the art that such substitutions can be made outside 
the reg.ons critical to the function of the molecule and still result in an active polypeptide. Amino 
acid res.dues essential to the activity of the polypeptide encoded by the isolated nucleic acid 



» 
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sequence of the Invention, and therefore preferably not subject to substitution, may be identified 
according to procedures known in the art. such as site-directed mutagenesis or alanine-scanning 
mutagenesis (see, e.g., Cunningham and Wells. 1989. Science 244: 1081-1085). In the latter 
technique, mutations are introduced at every positively charged residue in the molecule, and the 
5 resultant mutant molecules are tested for protease activity to identify amino acid residues that are 
critical to the activity of the molecule. Sites of substrate-protease interaction can also be 
determined by analysis of the three-dimensional structure as determined by such techniques as 
nuclear magnetic resonance analysis, crystallography or photoaffinity labelling (see. e.g., de Vos 
etat.. 1992. Science 255: 306-312; Smith et a/.. 1992, Journal of Molecular Biology 224: 899-904; 

10 Wlodaver eta/.. 1992, FEBS Letters 309: 59-64). 

The present invention also relates to isolated nucleic acid sequences encoding a 
polypeptide of the present invention, which hybridize under very low stringency conditions, 
preferably low stringency conditions, more preferably medium stringency conditions, more 
preferably medium-high stringency conditions, even more preferably high stringency conditions. 

15 and most preferably very high stringency conditions with a nucleic acid probe which hybridizes 
under the same conditions with the nucleic acid sequence of the invention or its complementary 
strand; or allelic variants and subsequences thereof (Sambrook et a/.. 1989. supra), as defined 
herein. 

The present invention also relates to isolated nucleic acid sequences produced by (a) 
hybridizing a DMA under very low. low. medium, medium-high. high, or very high stringency 
conditions with 0) a polynucleotide of the invention, (ii) a subsequence of (i). or (iii) a 
complementary strand of (i). or (ii); and (b) isolating the nucleic acid sequence. The subsequence 
is preferably a sequence of at least 100 nucleotides such as a sequence that encodes a 
polypeptide fragment which has protease activity. 

The introduction of a mutation into the nucleic acid sequence to exchange one nucleotide 
for another nucleotide may be accomplished by site-directed mutagenesis using any of the 
methods known in the art. Particularly useful is the procedure that utilizes a supercoiled, double 
stranded DMA vector with an insert of interest and two synthetic primers containing the desired 
mutation. The oligonucleotide primers, each complementary to opposite strands of the vector, 
extend during temperature cycling by means of Pfu DMA polymerase. On incorporation of the 
primers, a mutated plasmid containing staggered nicks is generated. Following temperature 
cycling, the product is treated with Dpn\ which is specific for methylated and hemimethylated DMA 
to digest the parental DMA template and to select for mutation-containing synthesized DMA. Other 
procedures known in the art may also be used. The invention also relates to an isolated 
35 polynucleotide encoding a polypeptide as defined in the first aspect. 
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Nucleic Add Constructs 

sequence of the present invention opereWy .,„i<ed „ one or ^ 

^7" ta * ,h : ^ sequ8nce in a sutewe ■« - — — - 

Lnr^.'T,'" 9, , 1,01 '°' * ranSCriPSOn ' ~«*— ™*«ce«on. 

translation, post-franslabonal modification, and secretion. 

,n „ m^U"* 10 ^ C0,,a,UC, " 15 <telined h6rein aS a nuc,elc add "•'«*. either single- or 
10 demanded which is teoteted ftom a natoraUy occurring gene or which has been medL 1 

e^n o'II'Tk r™ 01610 C ° mWned m ° iUX,aP0Sed ,n 3 — •* — « ome^ise 
»nT„T / !7 nUClaiC 18 Wn0nymous «*> *• *■« expression cassette 

when the nucfero acid contract comains all the conbol sequences required for expression of a 

n T"" """^ ' nVenSOn - 11,6 tem « d—d h-h as a 

» nuctac acrd sequence ma, diracby specifies th. amino acid sequence of Hs protein praduct The 
boundanea of the coding sequence era genera,* determfced by a rtbosome * 
^tanrote,) or by the ATG start codon (euKaryotos) .ooated Jus, upsbeam of ft. open reading 
frame attheS end of the mRNA and a banscrtpbon terminator sequence located jus. downsbeam 

20 limited to, DMA, cONA. and recombinant nucleic add sequences 

manin^T 0 " "TV* SK,Ue,Ke enCW " n9 " P ° lyPeP,i<ie * *° <™»»* *™««< ™» *> 
mandated in a vane* of ways to provide for exprassion of ma po^pepUde. Maniputedon of*. 

nucleto aad sequence prtor to ite insertion into a vector may be desirabie or necessary depending 

on the exprassion vector. The techniques for modify*, ^ acid P " 

. recombinant DMA methods are well known in the art. 

™* M J" 9 T " CO " ,ral 8e " U6nCeS " 18 <iafined hen *' 10 indude 311 componenfa ma, are 
necessanr or advantegeous for Ore expraaslon of a polypapdde of me prasen, invasion. Each 

Su^^ro, sequences indude. bu, are no, »mited ,o, a leader, po»ad.ny,aaon sequence. 

nZuTtT"?? P " >m0,er - "* «~ "* «— **» «^"a.or. At a 
sTlTrh C TT SeqUenC8S ' m " ,de 3 Prom ° lar ' "« ^«"P«on3l and translaUona, stop 
Zl r«Zrt TT* te ^ link * re for ^ose of inbodudng 
nuTc ^l " 9 " 9aSOn °' C ° n,ral SeVen0es >* »-e coding raglon of me 

TZ^LTT: eTO ° ,Sn9 3 ""'^ ^ ten " '° |,ara '"' ""^ 15 — * Here'" » a 
conflgurabon in whrch a conbo, sequence is appropriate* placed a, a position ralabve to me 
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coding sequence of the DMA sequence such that the control sequence directs the expression of a 
polypeptide. 

The control sequence may be an appropriate promoter sequence, a nucleic acid sequence 
that is recognized by a host cell for expression of the nucleic acid sequence. The promoter 
sequence contains transcriptional control sequences that mediate the expression of the 
polypeptide. The promoter may be any nucleic acid sequence which shows transcriptional activity 
In the host cell of choice including mutant, truncated, and hybrid promoters, and may be obtained 
from genes encoding extracellular or intracellular polypeptides either homologous or heterologous 
to the host cell. 

Examples of suitable promoters for directing the transcription of the nucleic acid constructs 
of the present invention, especially in a bacterial host cell, are the promoters obtained from the E 
col, lac operon. Streptomyces coelicolor agarase gene (dagA). Bacillus subtitis levansucrase gene 
(sacB), Bacillus licheniformis alpha-amylase gene (amyL), Bacillus stearothermophilus maltogenic 
amylase gene (amyM), Bacillus amyloliquefaciens alpha-amylase gene (amyQ). Bacillus 
Uchenifbrmis penicillinase gene (penP), Bacillus subtills xylA and xylB genes, and prokaryotic 
beta-lactamase gene (Villa-Kamaroff et al., 1978. Proceedings of the National Academy of 
Sciences USA 75: 3727-3731). as well as the tac promoter (DeBoer et al.. 1983, Proceedings of 
the National Academy of Sciences USA 80: 21-25). Further promoters are described in "Useful 
proteins from recombinant bacteria" in Scientific American, 1980, 242: 74-94; and in Sambrook et 
20 al., 1989, supra. 

Examples of suitable promoters for directing the transcription of the nucleic acid constructs 
of the present invention in a filamentous fungal host cell are promoters obtained from the genes 
for Aspergillus, oryzae TAKA amylase. Rhizomucor miehai aspartic proteinase. Aspergillus niger 
neutral alpha-amylase. Aspergillus niger acid stable alpha-amylase. Aspergillus niger or 

25 Aspergillus awamori glucoamylase (gfaA), Rhizomucor miehei lipase. Aspergillus oryzae alkaline 
protease, Aspergillus oryzae triose phosphate isomerase, Aspemiflus nldulans acetamidase. and 
Fusarium oxysporum trypsin-like protease (WO 96/00787). as well as the NA2-tpi promoter (a 
hybrid of the promoters from the genes for Aspergillus niger neutral alpha-amylase and 
Aspergdlus oryzae triose phosphate isomerase), and mutant, truncated, and hybrid promoters 

30 thereof. 

In a yeast host useful promoters are obtained from the genes for Sacchammyces 
cerevisrae enolase (ENO-1). Sacchammyces cerevislae galactokinase (GAL1). Sacchammyces 
cerev.sme alcohol dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase (ADH2/GAP). and 
Sacchammyces cerevlsiae 3-phosphoglycerate kinase. Other useful promoters for yeast host cells 
35 are descnbed by Romanos et al.. 1992. Yeast 8: 423488. 
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The control sequence may also be a suitable transcription terminator sequence a 
sequence recognized by a host cell to terminate transcription. The terminator sequence is 
operably linked to the 3' terminus of the nucleic acid sequence encoding the polypeptide Any 
terminator which is functional In the host cell of choice may be used in the present invention. 

Preferred terminators for filamentous fungal host ceils are obtained from the genes for 

Asperg,llus oryzae TAKA amylase, Aspergillus niger glucoamylase, Aspergillus nidulans 

anthramlate synthase, Aspergillus niger alpha-glucosidase. and Fusarium oxysporum trypsin-like 
Dro tease 



Preferred terminators for yeast host cells are obtained from the genes for Saccharomyces 
cerevisiae enolase. Saccharomyces cerevisiae cytochrome C (CYC1), and Saccharomyces 
cerevisiae glycera.dehyde-3-phosphate dehydrogenase. Other useful terminators for yeast host 
cells are described by Romanos et at., 1 992, supra. 

Preferred terminators for bacterial host cells, such as a Sac///us host cell, are the 
terminators from Bacillus licheniformis alpha-amylase gene (amyL), the Bacillus 
stearothermophilus myogenic amylase gene (arnyM), or the Bacillus amyloliquefaciens alpha- 
amylase gene (amyQ). 

The control sequence may also be a suitable leader sequence, a nontranslated region of 
an mRNA which is important for translation by the host cell. The leader sequence is operably 
l.nked to the 5' terminus of the nucleic acid sequence encoding the polypeptide. Any leader 
sequence that is functional in the host ceil of choice may be used in the present invention 

Preferred leaders for filamentous fungal host cells are obtained from the genes for 
Aspergillus oryzae TAKA amylase and Aspergillus nidulans triose phosphate isomerase 

Suitable leaders for yeast host cells are obtained from the genes for Saccharomyces 
cerev^ae enolase (ENO-1), Saccharomyces cerevisiae 3-phosphoglycerate kinase 
Saccharomyces cerevisiae alpha-factor, and Saccharomyces cerevisiae alcohoi 
dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase (ADH2/GAP). 

The control sequence may also be a polyadenylation sequence, a sequence operably 
hnked to the 3' terminus of the nucleic add sequence and which, when transcribed, is recognized 
by the host cell as a signal to add polyadenosine residues to transcribed mRNA Any 
polyadenylation sequence which is functional in the host cel. of choice may be used in the present 
invention. 

Preferred polyadenylation sequences for filamentous fongaJ host cells are obtained from 
the genes for Aspergillus oryzae TAKA amylase. Aspergillus niger glucoamylase. Aspergillus 
n,dulans anthranilate synthase. Fusanum oxysporum trypsin-like protease, and Aspergillus niger 
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Useful polyadenylatlon sequences for yeast host cells are described by Guo and Sherman. 
1995, Molecular Cellular Biology 15: 5983-5990. 

The control sequence may also be a signal peptide coding region that codes for an amino 
acid sequence linked to the amino terminus of a polypeptide and directs the encoded polypeptide 
into the cell's secretory pathway. The 5' end of the coding sequence of the nucleic acid sequence 
may inherently contain a signal peptide coding region naturally linked in translation reading frame 
with the segment of the coding region which encodes the secreted polypeptide. Alternatively, the 
5' end of the coding sequence may contain a signal peptide coding region which is foreign to the 
coding sequence. The foreign signal peptide coding region may be required where the coding 
sequence does not naturally contain a signal peptide coding region. Alternatively, the foreign 
signal peptide coding region may simply replace the natural signal peptide coding region in order 
to enhance secretion of the polypeptide. However, any signal peptide coding region which directs 
the expressed polypeptide into the secretory pathway of a host cell of choice may be used in the 
present invention. 

Effective signal peptide coding regions for bacterial host cells are the signal peptide coding 
regions obtained from the genes for Bacillus NCIB 11837 maltogenic amylase, Bacillus 
stearothemiophilus alpha-amylase, Bacillus licheniformis subtllisin. Bacillus lichenifbrmis alpha- 
amylase, Bacillus stearothermophilus neutral proteases (nprT, nprS. nprM). and Bacillus subtilis 
prsA Further signal peptides are described by Simonen and Palva, 1993, Microbiological 
Reviews 57: 109-137. 

Effective signal peptide coding regions for filamentous fungal host cells are the signal 
peptide coding regions obtained from the genes for Aspe&lius oryzae TAKA amylase, Aspergillus 
niger neutral amylase, Aspergillus niger glucoamylase, Rhizomucor miehei aspartic proteinase. 
Humicola insolens cellulase. and Humicola lanuginosa lipase. 

Useful signal peptides for yeast host cells are obtained from the genes for Saccharomyces 
cerevisiae alpha-factor and Saccharomyces cerevisiae invertase. Other useful signal peptide 
coding regions are described by Romanos ef a/.. 1 992, supra. 

The control sequence may also be a propeptide coding region that codes for an amino acid 
sequence positioned at the amino terminus of a polypeptide. The resultant polypeptide is known 
as a proenzyme or propolypeptide (or a zymogen in some cases). A propolypeptide is generally 
inactive and can be converted to a mature active polypeptide by catalytic or autocatalytic cleavage 
of the propeptide from the propolypeptide. The propeptide coding region may be obtained from the 
genes for Bacillus subtilis alkaline protease (aprE), Bacillus subtilis neutral protease (nprT), 
Saccharomyces cerevisiae alpha-factor. Rhizomucor miehei aspartic proteinase, and 
Myceliophthora thennophila laccase (WO 95/33836). 



20 



In a preferred embodiment, the propeptide coding region is nucleotides 1-498 of SEQ ID 
NO: 1 which encode amino acids -166 to -1 of SEQ ID NO: 2. 

Where both signal peptide and propeptide regions are present at the amino terminus of a 
polypeptide, the propeptide region is positioned next to the amino terminus of a polypeptide and 
the signal peptide region is positioned next to the amino terminus of the propeptide region. 

It may also be desirable to add regulatory sequences which allow the regulation of the 
expression of the polypeptide relative to the growth of the host cell. Examples of regulatory 
systems are those which cause the expression of the gene to be turned on or off in response to a 
chemical or physical stimulus, including the presence of a regulatory compound. Regulatory 
systems in prokaryotic systems include the lac, tac, and Up operator systems. In yeast, the ADH2 
system or GAL1 system may be used. In filamentous fungi, the TAKA alpha-amylase promoter. 
Asperai//us niger glucoamylase promoter, and Aspergillus oryzae glucoamylase promoter may be 
used as regulatory sequences. Other examples of regulatory sequences are those which allow for 
gene amplification. In eukaryolic systems, these include the dihydrofolate reductase gene which is 
amplified in the presence of methotrexate, and the metallothionein genes which are amplified with 
heavy metals. In these cases, the nucleic acid sequence encoding the polypeptide would be 
operably linked with the regulatory sequence. 

Expression Verfnrc 

The present invention also relates to recombinant expression vectors comprising a nucleic 
acd sequence of the present invention, a promoter, and transcriptional and translation^ stop 
signals. The various nucleic acid and control sequences described above may be joined together 
to produce a recombinant expression vector which may include one or more convenient restriction 
srtes to allow for insertion or substitution of the nucleic acid sequence encoding the polypeptide at 
such sites. Alternatively, the nucleic acid sequence of the present invention may be expressed by 
inserting the nucleic acid sequence or a nucleic acid construct comprising the sequence into an 
appropriate vector for expression. In creating the expression vector, the coding sequence is 
located in the vector so that the coding sequence is operably linked with the appropriate control 
sequences for expression. 

The recombinant expression vector may be any vector (e.g., a plasmid or virus) which can 
be conveniently subjected to recombinant DMA procedures and can bring about the expression of 
the nucle,c acid sequence. The choice of the vector will typically depend on the compatibility of the 
vector with the host cell into which the vector is to be introduced. The vectors may be linear or 
closed circular plasmids. 
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. " aUton ° m0USly re|,llca '" , 9 vector, a vector which exists as an 

ex, ach™nosoma, en«y, the replica.cn o. which Is Independen, of chromosome, releasor, e 7 
a Ptasnxd. an exbachromosoma, elemem, a mlnichmmosome. or an ertmcta, chromosome' t£ 
vector ma, contain any means fo, assudng Beaton. AHemaflvely. .he vector H 
wh,c* when Induced ,n.o *e host ce«, te imeg^ed ,nto me genome and teplicZCeZ 
-»~-» *■ ** ■ -as been intagratod. Furore, a sln a ,e JTJZZ 

genome otthe host cell, or a transposon may be used. 

which ZLT™ °! ^ Pr6Sent imen,k>n Preferab ' 1 ' C ° main °™ or ™" «ala«aWa marker 
MM P-mn easy selection of transfonmed cells. A selectable martcer is a gene me product of 

X r!T' f"^ " taaeri81 ^ a« «» da, genes from Bacillus 

iZ ^TTJ ^ ** " aSt h0 * «' b a» ADE2. HIS3. UEU2 

K?' ^ ^ SSleCtab ' e ^ "amentous fungal host cd 

nolude but are no, mm to. amdS (acetamfdase). «g S (omimine **m*j£Z Z 

«*--«»■* *» (hygmmyCn phospho*,nstarase>. 
reductase), py* (om«d^hospha te decarboxylase), sC (suHate adenylic ftS 
eymhase). as well as equivalent .hereof. Preferred for use in an AspeJTie,,^ 
ft. amdS and pyrG genes of Aspe, J/us ^ or AspagUus JTZ ^L Z 

Streptomyces hygrvscopicus. 9 6 of 

u. ° f fnVenll0n Preferab,y ""^ 30 e,e ^> that permits stable 
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encoding nucleic acid sequences. On the other hand, the vector may be integrated into the 
genome of the host cell by non-homologous recombination. 

For autonomous replication, the vector may further comprise an origin of replication 
enabling the vector to replicate autonomously in the host cell in question. Examples of bacterial 
origins of replication are the origins of replication of plasmids pBR322. pUC19, pACYC177, and 
PACYC184 permitting replication in £ coli, and pUB110. pE194, pTA1060. and pAMB1 permitting 
replication in Bacillus. Examples of origins of replication for use in a yeast host cell are the 2 
micron origin of replication. ARS1, ARS4, the combination of ARS1 and CEN3, and the 
combination of ARS4 and CEN6. The origin of replication may be one having a mutation which 
makes it functioning temperature-sensitive in the host cell (see, e.g., Ehriich. 1978, Proceedings of 
the National Academy of Sciences USA 75: 1433). 

More than one copy of a nucleic acid sequence of the present invention may be inserted 
into the host cell to increase production of the gene product An increase in the copy number of 
the nucleic acid sequence can be obtained by integrating at least one additional copy of the 
sequence into the host cell genome or by including an amplifiable selectable marker gene with the 
nucleic acid sequence where cells containing amplified copies of the selectable marker gene, and 
thereby additional copies of the nucleic acid sequence, can be selected for by cultivating the cells 
in the presence of the appropriate selectable agent. 

The procedures used to ligate the elements described above to construct the recombinant 
expression vectors of the present invention are well known to one skilled in the art (see, e.g., 
Sambrookef at, 1989, supra). 

The protease may also be co-expressed together with at least one other enzyme of 
interest for animal feed, such as phytase (EC 3.1.3.8 or 3.1.3.26); xylanase (EC 3.2.1.8); 
galactanase (EC 3.2.1.89); alpha-galactosidase (EC 3.2.1.22); protease (EC 3.4.-.-)! 
phospholipase A1 (EC 3.1.1.32); phospholipase A2 (EC 3.1.1.4); lysophospholipase (EC 3.1.1.5); 
phospholipase C (3.1.4.3); phospholipase D (EC 3.1.4.4); and/or beta-glucanase (EC 3.2.1 4 or 
EC 3.2.1.6). 

The enzymes may be co-expressed from different vectors, from one vector, or using a 
mixture of both techniques. When using different vectors, the vectors may have different 
selectable markers, and different origins of replication. When using only one vector, the genes can 
be expressed from one or more promoters. If cloned under the regulation of one promoter (di- or 
multi-cfstronic), the order in which the genes are cloned may affect the expression levels of the 
proteins. The protease may also be expressed as a fusion protein, i.e. that the gene encoding the 
protease has been fused in frame to the gene encoding another protein. This protein may be 
another enzyme or a functional domain from another enzyme. 
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Accordingly, the invention also relates to a recombinant expression vector or 
polynucleotide construct comprising a polynucleotide of the invention. 

Host Cells 

The present invention also relates to recombinant host cells, comprising a nucleic acid 
sequence of the invention, which are advantageously used In the recombinant production of the 
polypeptides. A vector comprising a nucleic acid sequence of the present invention is introduced 
into a host cell so that the vector is maintained as a chromosomal integrant or as a self-replicating 
extra-chromosomal vector as described earlier. The term "host cell" encompasses any progeny of 
a parent cell that is not identical to the parent cell due to mutations that occur during replication. 
The choice of a host ceB will to a large extent depend upon the gene encoding the polypeptide 
and its source. The host cell may be a unicellular microorganism, e.g.. a prokaryote, or a non- 
unicellular microorganism, e.g., a eukaryote. 

Useful unicellular cells are bacterial cells such as gram positive bacteria including, but not 
limited to, a Bacillus cell, or a Streptomyces cell, or cells of lactic acid bacteria; or gram negative 
bacteria such as E. coli and Pseudomonas sp. Lactic acid bacteria include, but are not limited to. 
species of the genera Lactococcus, Lactobacillus, Leuconostoc, Streptococcus, Pediococcus, and 
Enterococcus. Useful unicellular cells are bacterial cells such as gram positive bacteria including, 
but not limited to, a Bacillus cell, e.g., Bacillus atkalophilus, Bacillus amyloliquefaciens, Bacillus 
brevis, Bacillus circulans. Bacillus clausii, Bacillus coagulans, Bacillus lautus, Bacillus lentus, 
Bacillus licheniformis. Bacillus megaterium, Bacillus stearothermophilus, Bacillus subtilis, and 
Bacillus thuringiensls; or a Streptomyces cell, e.g., Streptomyces lividans or Streptomyces 
murinus, or gram negative bacteria such as E. coli and Pseudomonas sp. In a preferred 
embodiment, the bacterial host cell is a Bacillus lentus, Bacillus ticheniformis, Bacillus 
stearothermophilus or Bacillus subtilis cell. In another preferred embodiment, the Bacillus cell is 
an alkalophilic Bacillus. 

The introduction of a vector Into a bacterial host cell may, for instance, be effected by 
protoplast transformation (see, e.g., Chang and Cohen, 1979, Molecular General Genetics 168: 
111-115), using competent cells (see. e.g., Young and Spizizin, 1961, Journal of Bacteriology 81: 
823-829, or Dubnau and Davidoff-Abelson, 1971, Journal of Molecular Biology 56: 209-221), 
electroporation (see, e.g., Shigekawa and Dower, 1988, Biotechniques 6: 742-751), or conjugation 
(see, e.g., Koehler and Thorne. 1987, Journal of Bacteriology 169: 5771-5278). The host cell may 
be a eukaryote, such as a non-human animal cell, an insect cell, a plant cell, or a fungal cell. In 
one particular embodiment, the host cell is a fungal cell. "Fungi" as used herein includes the phyla 
Ascomycota. Basidiomycota. Chytridiomycota. and Zygomycete (as defined by Hawksworth et al.. 
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In, Mnsworth andBisb/s Dictionary of The Fungi. 8th edition, 1995, CAB International, University 
Press, Cambridge, UK) as well as the Oomycota (as cited in Hawksworth ef a/., 1995, supra, page 
1 71 ) and all mitosporic fungi (Hawksworth etal.,1 995, supra). 

In another particular embodiment, the fungal host cell is a yeast cell. "Yeast" as used 
herein includes ascosporogenous yeast (Endomycetales), basidiosporogenous yeast, and yeast 
belonging to the Fungi Imperfecta- (Blastomycetes). Since the classification of yeast may change in 
the future, for the purposes of this invention, yeast shall be defined as described in Biology and 
Activities of Yeast (Skinner, FA., Passmore, S.M., and Davenport, R.R., eds, Soc. App. Bacteriol. 
Symposium Series No. 9, 1980). 

The yeast host cell may be a Candida, Hansenula, Kluyveromyces, Pichia, 
Saccharomyces, Schizosaccharomyces, or Yarrowia cell. 

The fungal host cell may be a filamentous fungal cell. "Filamentous fungi" include all 
filamentous forms of the subdivision Eumycota and Oomycota (as defined by Hawksworth et a/.. 
1995, supra). The filamentous fungi are characterized by a mycelial wati composed of chitin, 
cellulose, glucan. chitosan, mannan, and other complex polysaccharides. Vegetative growth is by 
hyphal elongation and carbon catabolism is obligately aerobic. In contrast, vegetative growth by 
yeasts such as Saccharomyces cerevisiae is by budding of a unicellular thallus and carbon 
catabolism may be fermentative. 

Examples of filamentous fungal host cells are cells of species of, but not limited to, 
Acremonium, Aspergillus, Fusarium, Humicola, Mucor, Myceliophthora, Neumspora, Penicillium, 
Thielavia, Tolypodadium, or Trichoderma, 

Fungal cells may be transformed by a process involving protoplast formation, 
transformation of the protoplasts, and regeneration of the cell wall in a manner known per se. 
Suitable procedures for transformation of Aspergillus host cells are described in EP 238 023 and 
Yelton et al., 1984. Proceedings of the National Academy of Sciences USA 81: 1470-1474. 
Suitable methods for transforming Fusarium species are described by Malardier ef al., 1989, Gene 
78: 147-156 and WO 96/00787. Yeast may be transformed using the procedures described by 
Becker and Guarente, in Abelson, J.N. and Simon, M.I., editors, Guide to Yeast Genetics and 
Molecular Biology. Methods in Enzymology, Volume 194, pp 182-187. Academic Press, Inc.. New 
York; Ito etal, 1983, Journal of Bacteriology 153: 163; and Hinnen ef al., 1978, Proceedings of 
the National Academy of Sciences USA 75: 1920. 

The invention relates to a recombinant host cell comprising a polynucleotide of the 
invention, or an expression vector or polynucleotide construct of the invention. In a preferred 
embodiment, the recombinant host cell is a Bacillus cell. 
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Plants 



The present invention also relates to a transgenic plant, plant part, or plant cell which has 
been transformed with a nucleic acid sequence encoding a polypeptide having protease activity of 
the present invention so as to express and produce the polypeptide in recoverable quantities The 
polypeptide may be recovered from the plant or plant part. Alternatively, the plant or plant part 
conta.n.ng the recombinant polypeptide may be used as such for improving the quality of a food or 
feed, e.g., .mproving nutritional value, payability, and rheological properties, or to destroy an 
antinutritive factor. 

In a particular embodiment, the polypeptide is targeted to the endosperm storage 
vacuoles ,n seeds. This can be obtained by synthesizing it as a precursor with a suitable signal 
peptide, see Horvath et al in PNAS, Feb. 15, 2000, vol. 97, no. 4. p. 1914-1919. 

The transgenic plant can be dicotyledonous (a dicot) or monocotyledonous (a monocot) 
or engineered variants thereof. Examples of monocot plants are grasses, such as meadow grass 
(blue grass. Poa), forage grass such as festuca, lolium. temperate grass, such as Agrostis and 
cereals, e.g., wheat, oats, rye, barley, rice, sorghum, and maize (com). Examples of dicot plants 
are tobacco, legumes, such as lupins, potato, sugar beet. pea. bean and soybean, and cruciferous 
Plants (family Brassicaceae). such as cauliflower, rape seed, and the closely related model 
organ,sm Arabidopsis thaliana. Low-phytate plants as described e.g. in US patent no. 5 689 054 
and US patent no. 6,111.168 are examples of engineered plants. 

Examples of plant parts are stem, callus, leaves, root, fruits, seeds, and tubers. Also 
speofic plant tissues, such as chloroplast, apoplast. mitochondria, vacuole, peroxisomes and 
cytoplasm are considered to be a plant part Furthermore, any plant cell, whatever the tissue 
origin, is considered to be a plant part. 

Also included within the scope of the present invention are the progeny of such plants 
plant parts and plant cells. 

The transgenic plant or plant cell expressing a polypeptide of the present invention may 
be constructed in accordance with methods known in the art. Briefly, the plant or plant cell is 
constructed by incorporating one or more expression constructs encoding a polypeptide of the 
present invention into the plant host genome and propagating the resulting modified plant or plant 
cell into a transgenic plant or plant cell. 

Conveniently, the expression construct is a nucleic acid construct which comprises a 
nucle,c acrd sequence encoding a polypeptide of the present invention operably linked with 
appropnate regulatory sequences required for expression of the nucleic acid sequence in the plant 
or plant part of choice. Furthermore, the expression construct may comprise a selectable marker 
useful for identifying host cells into which the expression construct has been integrated and DMA 
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sequences necessary for introduction of the construct into the plant in question (the latter depends 
on the DNA introduction method to be used). 

The choice of regulatory sequences, such as promoter and terminator sequences and 
optionally signal or transit sequences are determined, for example, on the basis of when, where 
and how the polypeptide is desired to be expressed. For instance, the expression of the gene 
encoding a polypeptide of the present invention may be constitutive or inducible, or may be 
developmental, stage or tissue specific, and the gene product may be targeted to a specific tissue 
or plant part such as seeds or leaves. Regulatory sequences are. for example, described by 
Tague ef a/.. 1 988, Plant Physiology 86: 506. 

For constitutive expression, the 35S-CaMV promoter may be used (Franck et al 1980 
Cell 21: 285-294). Organ-specific promoters may be, for example, a promoter from storage sink 
tissues such as seeds, potato tubers, and fruits (Edwards & Coruzzi, 1990, Ann. Rev. Genet 24: 
275-303), or from metabolic sink tissues such as meristems (Ito et al., 1994. Plant Mof. Biol. 24- 
863-878), a seed specific promoter such as the glutelin. prolamin. globulin, or albumin promoter 
from rice (Wu et al., 1998, Plant and Cell Physiology 39: 885-889), a Viola faba promoter from the 
legumin B4 and the unknown seed protein gene from Vioia faba (Conrad et al., 1998, Journal of 
Plant Physiology 152: 708-711), a promoter from a seed oil body protein (Chen et al., 1998 Plant 
and Cell Physiology 39: 935-941). the storage protein napA promoter from Brassioa napus, or any 
other seed specific promoter known in the art, e.g., as described in WO 91/14772. Furthermore 
the promoter may be a leaf specific promoter such as the nbcs promoter from rice or tomato 
(Kyozuka et al., 1993, Plant Physiology 102: 991-1000, the chlorella virus adenine 
methyltransferase gene promoter (Mitra and Higgins. 1994, Plant Molecular Biology 26: 85-93) or 
the aidP gene promoter from rice (Kagaya et al., 1995. Molecular and General Genetics 248- 668- 
674), or a wound inducible promoter such as the potato pin2 promoter (Xu ef al., 1993 Plant 
25 Molecular Biology 22: 573-588). 

A promoter enhancer element may also be used to achieve higher expression of the 
protease in the plant. For instance, the promoter enhancer element may be an intron which is 
placed between the promoter and the nucleotide sequence encoding a polypeptide of the present 
mventon. For instance, Xu ef al., 1993, supra disclose the use of the first intron of the rice actin 1 
30 gene to enhance expression. 

Still further, the codon usage may be optimized for the plant species in question to 
improve expression (see Horvath et al referred to above). 

The selectable marker gene and any other parts of the expression construct may be 
chosen from those available in the art. 
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The nuclefe acid construct is incorporated into the plant genome according to 
conventional techniques known in the art. including Agrobacterfum-mediated transformation, virus- 
mediated transformation, microinjection, particle bombardment, biolistic transformation, and 
electroporation (Gasser ef a/.. 1990. Scfence 244: 1293; Potrykus, 1990. Bio/Techno/ogy 8: 535; 
Shimamoto et a/., 1989, Nature 338: 274). 

Presently, Agmbacterium fumefec/ens-mediated gene transfer is the method of choice for 
generating transgenic dicots (tor a review, see Hooykas and Schilperoort, 1992. Plant Molecular 
Biology 19: 15-38). However it can also be used for transforming monocots. although other 
transformation methods are generally preferred for these plants. Presently, the method of choice 
for generating transgenic monocots is particle bombardment (microscopic gold or tungsten 
particles coated with the transforming DMA) of embryonic call, or developing embryos (Christou. 
1992, Plant Journal 2: 275-281; Shimamoto. 1994. Current Opinion Biotechnology 5: 158-162- 
Vasil et ai, 1992, Biotechnology 10: 667-674). An alternative method for transformation of 
monocots is based on protoplast transformation as described by Omiiulleh et a/.. 1993, Plant 
Molecular Biology 21: 415-428. 

Following transformation, the transformants having incorporated therein the expression 
construct are selected and regenerated into whole plants according to methods well-known in the 
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The present invention also relates to methods for producing a polypeptide of the present 
invention comprising (a) cultivating a transgenic plant or a plant ceil comprising a nucleic acid 
sequence encoding a polypeptide having protease activity of the present invention under 
conditions conducive for production of the polypeptide; and (b) recovering the polypeptide. The 
invention relates to a transgenic plant, or plant part, comprising a polynucleotide as defined in 
claim 8, or an expression vector or polynucleotide construct of the invention. 

Animals 

The present invention also relates to a transgenic, non-human animal and products or 
elements thereof, examples of which are body fluids such as milk and blood, organs, flesh, and 
animal cells. Techniques for expressing proteins, e.g. In mammalian cells, are known in the art. 
see e.g. the handbook Protein Expression: A Practical Approach. Higgins and Hames (eds)] 
Oxford University Press (1999). and the three other handbooks in this series relating to Gene 
Transcription. RNA processing, and Post-translational Processing. Generally speaking, to prepare 
a transgenic animal, selected cells of a selected animal are transformed with a nucleic acid 
sequence encoding a polypeptide having protease activity of the present invention so as to 
express and produce the polypeptide. The polypeptide may be recovered from the animal, e.g 
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from the milk of female animals, or the polypeptide may be expressed to the benefit of the animal 
itself, e.g. to assist the animal's digestion. Examples of animals are mentioned below in the 
section headed Animal Feed. 

To produce a transgenic animal with a view to recovering the protease from the milk of the 
animal, a gene encoding the protease may be inserted into the fertilized eggs of an animal in 
question. e.g. by use of a transgene expression vector which comprises a suitable milk protein 
promoter, and the gene encoding the protease. The transgene expression vector is mlcroinjected 
into fertilized eggs, and preferably permanently integrated into the chromosome. Once the egg 
begins to grow and divide, the potential embryo is implanted into a surrogate mother, and animals 
carrying the transgene are identified. The resulting animal can then be multiplied by conventional 
breeding. The polypeptide may be purified from the animal's milk, see e.g. Meade. H.M. et al 
(1999): Expression of recombinant proteins in the milk of transgenic animals. Gene expression 
systems: Using nature for the art of expression. J, M. Fernandez and J. P. Hoeffler (eds) 
Academic Press. 

In the alternative, in order to produce a transgenic non-human animal that carries in the 
genome of its somatic and/or germ cells a nucleic acid sequence including a heterologous 
transgene construct including a transgene encoding the protease, the transgene may be operably 
linked to a first regulatory sequence for salivary gland specific expression of the protease as 
disclosed in WO 2000064247. 

The invention relates to a transgenic, non-human animal, or products, or elements thereof 
compnsing a polynucleotide, or an expression vector or polynucleotide construct of the invention. ' 

Methods of Production 

The present invention also relates to methods for producing a polypeptide of the present 
invention comprising (a) cultivating a host cell or a transgenic plant or animal under conditions 
conducive for production of the polypeptide in a supernatant; and optionally (b) recovering the 
polypeptide. 

In the production methods of the present invention, the cells are cultivated in a nutrient 
med,um suitable for production of the polypeptide using methods known in the art For example 
the cell may be cultivated by shake flask cultivation, small-scale or large-scale fermentation 
(.nclud.ng continuous, batch, fed-batch, or solid state fermentations) in laboratory or industrial 
fermentors performed in a suitable medium and under conditions allowing the polypeptide to be 
expressed and/or isolated. The cultivation takes place in a suitable nutrient medium comprising 
carbon and nitrogen sources and inorganic salts, using procedures known in the art Suitable 
med,a are available from commercial suppliers or may be prepared according to published 
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compositions (e.g., in catalogues of the American Type Culture Collection). If the polypeptide is 
secreted into the nutrient medium, the polypeptide can be recovered directly from the medium. If 
the polypeptide is not secreted, it can be recovered from cell lysates. 

The polypeptides may be detected using methods known in the art that are specific for the 
polypeptides. These detection methods may include use of specific antibodies, formation of a 
product, or disappearance of a substrate. For example, a protease assay may be used to 
determine the activity of the polypeptide as described herein. 

The resulting polypeptide may be recovered by methods known in the art. For example, 
the polypeptide may be recovered from the nutrient medium by conventional procedures including, 
but not limited to, centrifugation, filtration, extraction, spray-drying, evaporation, or precipitation. 

The polypeptides of the present invention may be purified by a variety of procedures 
known in the art including, but not limited to, chromatography (e.g., ion exchange, affinity, 
hydrophobic, chromatofocusing, and size exclusion), electrophoretic procedures (e.g., preparative 
isoelectric focusing), differential solubility (e.g., ammonium sulfate precipitation), SDS-PAGE, or 
extraction (see, e.g., Protein Purification, J.-C. Janson and Lars Ryden, editors, VCH Publishers, 
New York, 1989). 

Compositions 

In a still further aspect, the present invention relates to compositions comprising a 
polypeptide of the present invention. The polypeptide compositions may be prepared in 
accordance with methods known in the art and may be in the form of a liquid or a dry composition. 
For instance, the polypeptide composition may be in the form of a granulate or a microgranulate. 
The polypeptide to be included in the composition may be stabilized in accordance with methods 
known in the art. Examples are given below of preferred uses of the polypeptides or polypeptide 
compositions of the invention. 

Animal Feed 

The present invention is also directed to methods for using the polypeptides of the 
invention in animal feed, as well as to feed compositions and feed additives comprising the 
polypeptides of the invention. The term animal includes all animals, including human beings. 
Examples of animals are non-ruminants, and ruminants, such as cows, sheep and horses. In a 
particular embodiment, the animal is a non-ruminant animal. Non-ruminant animals include mono- 
gastric animals. e.g. pigs or swine (including, but not limited to, piglets, growing pigs, and sows); 
poultry such as turkeys, ducks and chicken (including but not limited to broiler chicks, layers); 
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young calves; and fish (including but not limited to salmon, trout, tilapia. catfish and carps; and 
crustaceans (including but not limited to shrimps and prawns) 

The term feed or feed composition means any compound, preparation, mixture, or 
composition suitable for, or intended for intake by an animal. 

In the use according to the invention the protease can be fed to the animal before, after, or 
simultaneously with the diet. The latter is preferred. 

In a particular embodiment, the protease, in the form in which it Is added to the feed, or 
when being included In a feed additive, is well-defined. Well-defined means that the protease 
preparation is at least 50% pure as determined by Size-exclusion chromatography (see Example 
12 of WO 01/58275), In other particular embodiments the protease preparation is at least 60, 70. 
80. 85. 88, 90, 92, 94, or at least 95% pure as determined by this method. A well-defined protease 
preparation is advantageous. For instance, it is much easier to dose correctly to the feed a 
protease that is essentially free from interfering or contaminating other proteases. The term dose 
correctly refers in particular to the objective of obtaining consistent and constant results, and the 
capability of optimising dosage based upon the desired effect 

For the use in animal feed, however, the protease need not be that pure; it may e.g. 
include other enzymes, in which case it could be termed a protease preparation. The protease 
preparation can be (a) added directly to the feed (or used directly in a treatment process of 
vegetable proteins), or (b) it can be used in the production of one or more intermediate 
compositions such as feed additives or premixes that is subsequently added to the feed (or used 
in a treatment process). The degree of purity described above refers to the purity of the original 
protease preparation, whether used according to (a) or (b) above. 

Protease preparations with purities of this order of magnitude are in particular obtainable 
using recombinant methods of production, whereas they are not so easily obtained and also 
subject to a much higher batch-to-batch variation when the protease is produced by traditional 
fermentation methods. Such protease preparation may of course be mixed with other enzymes. 

In a particular embodiment, the protease for use according to the invention is capable of 
solubilising vegetable proteins. A suitable assay for determining solubilised protein is disclosed in 
Example 11. 

The term vegetable proteins as used herein refers to any compound, composition, 
preparation or mixture that includes at least one protein derived from or originating from a 
vegetable, including modified proteins and protein-derivatives. In particular embodiments, the 
protein content of the vegetable proteins is at least 10, 20. 30. 40. 50. or 60% (w/w). Vegetable 
proteins may be derived from vegetable protein sources, such as legumes and cereals, for 
example materials from plants of the families Fabaceae (Leguminosae), Crucrfereceae, 
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Chenopodiaceae, and Poaceae, such as soy bean meal, lupin meal and rapeseed meal. In a 
particular embodiment, the vegetable protein source is material from one or more plants of the 
family Fabaceae, e.g. soybean, lupine, pea, or bean. 

In another particular embodiment, the vegetable protein source is material from one or 
more plants of the family Chenopodiaceae, e.g. beet, sugar beet, spinach or quinoa. Other 
examples of vegetable protein sources are rapeseed. and cabbage. Soybean is a preferred 
vegetable protein source. Other examples of vegetable protein sources are cereals such as 
barley, wheat, rye. oat, maize (com), rice, and sorghum. 

The treatment according to the invention of vegetable proteins with at least one protease of 
the invention results in an increased solubilisation of vegetable proteins. The following are 
examples of % solubilised protein obtainable using the proteases of the invention in a monogastric 
in vitro model: At least 102%, 103%, 104%, 105%, 106%. or at least 107%, relative to a blank. 
The percentage of solubilised protein is determined using the monogastric in vitro model of 
Example 11. The term solubilisation of proteins basically means bringing protein(s) into solution. 
15 Such solubilisation may be due to protease-mediated release of protein from other components of 
the usually complex natural compositions such as feed. 

In a further particular embodiment, the protease for use according to the invention is 
capable of increasing the amount of digestible vegetable proteins. The following are examples of 
% digested or digestible protein obtainable using the proteases of the invention in a monogastric 
20 in vitro model: At least 104%, 105%, 106%, 107%, 108%, 109%, or at least 110%, relative to a 
blank. The percentage of digested or digestible protein is determined using the in vitro model of 
Example 11. 

The following are examples of % digested or digestible protein obtainable using the 
proteases of the invention in an aquaculture in vitro model: At least 103%, 104%, 105%, 106%, 

25 107%, 108%, 109% or at least 110%, relative to a blank. The percentage of digested or digestible 
protein is determined using the aquaculture in vitro model of Example 12. 

In a still further particular embodiment, the protease for use according to the invention is 
capable of increasing the Degree of Hydrolysis (DH) of vegetable proteins. The following are 
examples of Degree of Hydrolysis increase obtainable in a monogastric in vitro model: At least 

30 1 02%, 103%, 104%, 105%, 106%, or at least 107%, relative to a blank. The DH is determined 
using the monogastric in vitro model of Example 11. The following are examples of Degree of 
Hydrolysis increase obtainable in an aquaculture in vitro model: At least 102%. 103%, 104%, 
105%, 106%. or at least 107%, relative to a blank. The DH is determined using the aquaculture in 
vitro model of Example 12. 
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In a particular embodiment of a (pre-) treatment process of the invention, the protease(s) in 
question is affecting (or acting on, or exerting its solubilising Influence on) the vegetable proteins 
or protein sources. To achieve this, the vegetable protein or protein source Is typically suspended 
in a solvent, e.g. an aqueous solvent such as water, and the pH and temperature values are 
adjusted paying due regard to the characteristics of the enzyme in question. For example, the 
treatment may take place at a pH-value at which the activity of the actual protease is at least at 
least 40%, 50%, 60%, 70%, 80% or at least 90%. Likewise, for example, the treatment may take 
place at a temperature at which the activity of the actual protease is at least 40%, 50%, 60%, 
70%, 80% or at least 90%. The above percentage activity indications are relative to the maximum 
activities. The enzymatic reaction is continued until the desired result is achieved, following which 
it may or may not be stopped by inactivating the enzyme, e.g. by a heat-treatment step. 

In another particular embodiment of a treatment process of the invention, the protease 
action is sustained, meaning e.g. that the protease is added to the vegetable proteins or protein 
sources, but its solubilising influence is so to speak not switched on until later when desired, once 
suitable solubilising conditions are established, or once any enzyme inhibitors are inactivated, or 
whatever other means could have been applied to postpone the action of the enzyme. 

In one embodiment the treatment is a pre-treatment of animal feed or vegetable proteins 
for use in animal feed. 

The term improving the nutritional value of an animal feed means improving the availability 
and/or digestibility of the proteins, thereby leading to increased protein extraction from the diet 
components, higher protein yields, increased protein degradation and/or improved protein 
utilisation. The nutritional value of the feed is therefore increased, and the animal performance 
such as growth rate and/or weight gain and/or feed conversion ratio (i.e. the weight of ingested 
feed relative to weight gain) of the animal is/are improved. 

In a particular embodiment the feed conversion ratio is increased by at least 1%, 2%, 3%, 
4%, 5%, 6%, 7%. 8%, 9% or at least 10%. In a further particular embodiment the weight gain is 
increased by at feast 2%, 3%, 4%, 5%, 6%, 7%, 8%. 9%. 10% or at least 11%. These figures are 
relative to control experiments with no protease addition. 

The feed conversion ratio (FCR) and the weight gain may be calculated as described in 
EEC (1986): Directive de la Commission du 9 avril 1986 fixant la methode de calcul de la valeur 
energetfque des aliments composes destines a la volatile. Journal Officiel des Communautes 
Europeennes, L130, 53 - 54. 

The protease can be added to the feed in any form, be it as a relatively pure protease, or in 
admixture with other components intended for addition to animal feed, i.e. in the form of animal 
feed additives, such as the so-called pre-mixes for animal feed. 
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In a further aspect the present Invention relates to compositions for use in animal feed, 
such as animal feed, and animal feed additives, e.g. premixes. 

Apart from the protease of the invention, the animal feed additives of the invention contain 
at least one fat-soluble vitamin, and/or at least one water soluble vitamin, and/or at least one trace 
5 mineral. The feed additive may also contain at least one macro mineral. 

Further, optional, feed-additive ingredients are colouring agents, aroma compounds, 
stabilisers, antimicrobial peptides, including antifungal polypeptides, and/or at least one other 
enzyme selected from amongst phytase (EC 3.1.3.8 or 3.1.3.26); xylanase (EC 3.2.1.8); 
galactanase (EC 3.2.1.89); alpha-galactosidase (EC 3.2.1.22); protease (EC 3.4.-.-). 
phospholipase A1 (EC 3.1.1.32); phospholipase A2 (EC 3.1.1.4); lysophospholipase (EC 3.1.1.5); 
phospholipase C (3.1.4.3); phospholipase D (EC 3.1.4.4); and/or beta-glucanase (EC 3.2.1.4 or 
EC 3.2.1.6). 

In a particular embodiment these other enzymes are well-defined (as defined above for 
protease preparations). 

15 Examples of antimicrobial peptides (AMP's) are CAP18, Leucocin A. Tritrpticin, Protegrin- 

1. Thanatin, Defensin, Lactoferrin. Lactoferricin. and Ovispirin such as Novispirin (Robert Lehrer. 
2000), Plectasins, and Statins, including the compounds and polypeptides disclosed in 
PCT/DK02/00781 and PCT/DK02/00812, as well as variants or fragments of the above that retain 
antimicrobial activity. 

20 Examples of antifungal polypeptides (AFP's) are the Aspergillus glganteus, and Aspergillus 

niger peptides, as well as variants and fragments thereof which retain antifungal activity, as 

disclosed in WO 94/01459 and WO 02/090384. 

Usally fet- and water-soluble vitamins, as well as trace minerals form part of a so-called 

premix intended for addition to the feed, whereas macro minerals are usually separately added to 
25 the feed. A premix enriched with a protease of the invention, is an example of an animal feed 

additive of the invention. 

In a particular embodiment, the animal feed additive of the invention is intended for being 
included (or prescribed as having to be included) in animal diets or feed at levels of 0.01 to 10.0%; 
more particularly 0.05 to 5.0%; or 0.2 to 1.0% (% meaning g additive per 100 g feed). This is so in 
30 particular for premixes. 

The following are non-exclusive lists of examples of these components: 

Examples of fat-soluble vitamins are vitamin A, vitamin D3, vitamin E. and vitamin K. e.g. 
vitamin K3. 

Examples of water-soluble vitamins are vitamin B12. biotin and choline, vitamin B1, vitamin 
35 B2. vitamin B6, niacin, folic acid and pantothenate. e.g. Ca-D-panthothenate. 
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Examples of trace minerals are manganese, zinc, iron, copper, iodine, selenium, and 

cobalt 

Examples of macro minerals are calcium, phosphorus and sodium. 
The nutritional requirements of these components (exemplified with poultry and 
piglets/pigs) are listed in Table A of WO 01/58275. Nutritional requirement means that these 
components should be provided in the diet in the concentrations indicated. 

In the alternative, the animal feed additive of the invention comprises at least one of the 
individual components specified in Table A of WO 01/58275. At least one means either of, one or 
more of, one, or two, or three, or four and so forth up to all thirteen, or up to all fifteen individual 
components. More specifically, this at least one individual component is included in the additive of 
the invention in such an amount as to provide an in-feed-concentration within the range indicated 
in column four, or column five, or column six of Table A 

The present invention also relates to animal feed compositions. Animal feed compositions 
or diets have a relatively high content of protein. Poultry and pig diets can be characterised as 
15 indicated in Table B of WO 01/58275, columns 2-3. Fish diets can be characterised as indicated in 
column 4 of this Table B. Furthermore such fish diets usually have a crude fat content of 200-310 
g/kg. WO 01/58275 corresponds to US 09/779334 which is hereby incorporated by reference. 

An animal feed composition according to the invention has a crude protein content of 50- 
800 g/kg, and furthermore comprises at least one protease as claimed herein. 

Furthermore, or in the alternative (to the crude protein content indicated above), the animal 
feed composition of the invention has a content of metabolisable energy of 10-30 MJ/kg; and/or a 
content of calcium of 0.1-200 g/kg; and/or a content of available phosphorus of 0.1-200 g/kg; 
and/or a content of methionine of 0.1-100 g/kg; and/or a content of methionine plus cysteine of 
0. 1 -1 50 g/kg; and/or a content of lysine of 0.5-50 g/kg. 

In particular embodiments, the content of metabolisable energy, crude protein, calcium, 
phosphorus, methionine, methionine plus cysteine, and/or lysine is within any one of ranges 2, 3, 
4 or 5 in Table B of WO 01/58275 (R. 2-5). 

Crude protein is calculated as nitrogen (N) multiplied by a factor 6.25. i.e. Crude protein 
(g/kg)« N (g/kg) x 6.25. The nitrogen content is determined by the Kjeldahl method (A.O.AO. 
30 1984. Official Methods of Analysis 14th ed., Association of Official Analytical Chemists.' 
Washington DC). 

Metabolisable energy can be calculated on the basis of the NRC publication Nutrient 
requirements in swine, ninth revised edition 1988. subcommittee on swine nutrition, committee on 
animal nutrition, board of agriculture, national research council. National Academy Press. 
Washington. D.c. pp. 2-6. and the European Table of Energy Values for Poultry Feed-stuffs. 



20 



25 



35 



35 



Spelderholt centre for poultry research and extension, 7361 DA Beekbergen, The Netherlands. 
Grafisch bedrijf Ponsen & looijen bv, Wageningen. ISBN 90-71463-12-5. 

The dietary content of calcium, available phosphorus and amino acids in complete animal 
diets is calculated on the basis of feed tables such as Veevoedertabel 1997, gegevens over 
chemische samenstelling, verteerbaarheid en voederwaarde van voedermiddelen. Central 
Veevoederbureau, Runderweg 6, 8219 pk Lelystad. ISBN 90-72839-13-7. 

In a particular embodiment the animal feed composition of the invention contains at least 
one vegetable protein or protein source as defined above. 

In still further particular embodiments, the animal feed composition of the invention 
contains 0-80% maize; and/or 0-80% sorghum; and/or 0-70% wheat; and/or 0-70% Barley; and/or 
0-30% oats; and/or 0-40% soybean meal; and/or 0-10% fish meal; and/or 0-20% whey. 

Animal diets can e.g. be manufactured as mash feed (non pelleted) or pelleted feed. 
Typically, the milled feed-stuffs are mixed and sufficient amounts of essential vitamins and 
minerals are added according to the specifications for the species in question. Enzymes can be 
added as solid or liquid enzyme formulations. For example, a solid enzyme formulation is typically 
added before or during the mixing step; and a liquid enzyme preparation is typically added after 
the pelleting step. The enzyme may also be incorporated in a feed additive or premix. 

The final enzyme concentration in the diet is within the range of 0.01-200 mg enzyme 
protein per kg diet for example in the range of 0.5-25 mg enzyme protein per kg animal diet. 

The protease should of course be applied in an effective amount, i.e. in an amount 
adequate for improving solubilisation and/or improving nutritional value of feed. It is at present 
contemplated that the enzyme is administered in one or more of the following amounts (dosage 
ranges): 0.01-200; 0.01-100; 0.5-100; 1-50; 5-100; 10-100; 0.05-50; or 0.10-10 - all these ranges 
being in mg protease enzyme protein per kg feed (ppm). 

For determining mg enzyme protein per kg feed, the protease is purified from the feed 
composition, and the specific activity of the purified protease is determined using a relevant assay 
(see under protease activity, substrates, and assays). The protease activity of the feed 
composition as such is also determined using the same assay, and on the basis of these two 
determinations, the dosage in mg enzyme protein per kg feed Is calculated. 

The same principles apply for determining mg enzyme protein in feed additives. Of course, 
if a sample is available of the protease used for preparing the feed additive or the feed, the 
specific activity is determined from this sample (no need to purify the protease from the feed 
composition or the additive). 

The present invention is further described by the following examples which should not be 
construed as limiting the scope of the invention. 
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Detergent Compositions 

The protease of the invention may be added to and thus become a component of a 
detergent composition. The detergent composition of the invention may for example be formulated 
as a hand or machine laundry detergent composition including a laundry additive composition 
suitable for pre-treatment of stained fabrics and a rinse added fabric softener composition, or be 
formulated as a detergent composition for use in general household hard surface cleaning 
operations, or be formulated for hand or machine dishwashing operations. 

In a specific aspect, the invention provides a detergent additive comprising the protease of 
the invention. The detergent additive as well as the detergent composition may comprise one or 
more other enzymes such as another protease, such as alkaline proteases from Bacillus, a lipase, 
a cutinase, an amylase, a carbohydrase, a cellulase, a pectinase, a mannanase, an arabinase, a 
galactanase, a xylanase, an oxidase, e.g., a laccase, and/or a peroxidase. 

In general the properties of the chosen enzyme(s) should be compatible with the selected 
detergent, (i.e. pH-optimum, compatibility with other enzymatic and non-enzymatic ingredients, 
etc.), and the enzyme(s) should be present in effective amounts. 

Suitable lipases include those of bacterial or fungal origin. Chemically modified or protein 
engineered mutants are included. Examples of useful lipases include lipases from Humicola 
(synonym Thermomyces), e.g. from H. lanuginosa (T. lanuginosus) as described in EP 258068 
and EP 306216 or from H. insolens as described in WO 96/13580, a Pseudomonas lipase, e.g. 
from P. alcaligenes or P. pseudoalcaligenes (EP 218272), P. cepacia (EP 331376), P. stutzeri 
(GB 1,372,034), P. ffuorescens, Pseudomonas sp. strain SD 705 (WO 95/06720 and WO 
96/27002), P. wisconsinensis (WO 96/12012), a Bacillus lipase, e.g. from B. subtilis (Dartois et al. 
(1993), Biochemica et Biophysica Acta. 1131. 253-360). B. steamthermophilus (JP 64/744992) or 
B. pumilus (WO 91/16422). Other examples are lipase variants such as those described in WO 
92/05249. WO 94/01541, EP 407225, EP 260105, WO 95/35381, WO 96/00292, WO 95/30744, 
WO 94/25578, WO 95/14783, WO 95/22615, WO 97/04079 and WO 97/07202. Preferred 
commercially available lipase enzymes include Lipolase™and Lipolase Ultra™ (Novozymes A/S). 

Suitable amylases (alpha- and/or beta-) Include those of bacterial or fungal origin. 
Chemically modified or protein engineered mutants are included. Amylases include, for example, 
alpha-amylases obtained from Bacillus, e.g. a special strain of B. licheniformis, described in more 
detail in GB 1,296,839. Examples of useful amylases are the variants described in WO 94/02597, 
WO 94/18314. WO 96/23873. and WO 97/43424, especially the variants with substitutions in one 
or more of the following positions: 15. 23. 105. 106, 124. 128, 133, 154, 156, 181, 188. 190, 197, 
202, 208, 209, 243, 264, 304, 305, 391, 408, and 444. Commercially available amylases are 
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Duramyl™, Termamyl™, Fungamyl™ and BAN™ (Novozymes A/S). Rapidase™ and Purastar™ 
(from Genencor International Inc.). 

Suitable cellulases include those of bacterial or fungal origin. Chemically modified or 
protein engineered mutants are included. Suitable cellulases include cellulases from the genera 
5 Bacillus, Pseudomonas. Humicola, Fusarium, Thielavia, Acremonium, e.g. the fungal cellulases 
produced from Humicola insolens, Myceliophthom thermophita and Fusarium oxysporum 
disclosed in US 4.435,307, US 5.648,263. US 5,691.178, US 5,776,757 and WO 89/09259. 
Especially suitable cellulases are the alkaline or neutral cellulases having colour care benefits. 
Examples of such cellulases are cellulases described in EP 0 495257, EP 531372, WO 96/11262. 

10 WO 96/29397, WO 98/08940. Other examples are cellulase variants such as those described in 
WO 94/07998, EP 0 531 315, US 5,457.046. US 5.686,593. US 5.763.254. WO 95/24471. WO 
98/12307 and WO 99/01544. Commercially available cellulases include Celluzyme™, and 
Carezyme™ (Novozymes A/S). Clazinase™, and Puradax HA™ (Genencor International Inc.). and 
KAC-500(B)™(Kao Corporation). 

15 Suitable peroxidases/oxidases include those of plant, bacterial or fungal origin. Chemically 

modified or protein engineered mutants are included. Examples of useful peroxidases include 
peroxidases from Coprinus, e.g. from C. cinereus, and variants thereof as those described in WO 
93/24618, WO 95/10602. and WO 98/15257. Commercially available peroxidases include 
Guardzyme™ (Novozymes). 

The detergent enzyme(s) may be included in a detergent composition by adding separate 
additives containing one or more enzymes, or by adding a combined additive comprising all of 
these enzymes. A detergent additive of the invention, i.e. a separate additive or a combined 
additive, can be formulated e.g. as a granulate, a liquid, a slurry, etc. Preferred detergent additive 
formulations are granulates, in particular non-dusting granulates, liquids, in particular stabilized 
25 liquids, or slurries. 

Non-dusting granulates may be produced. e.g., as disclosed in US 4.106.991 and 
4.661,452 and may optionally be coated by methods known in the art. Examples of waxy coating 
materials are polyethylene oxide) products (polyethyleneglycol. PEG) with mean molar weights of 
1000 to 20000; ethoxylated nonylphenols having from 16 to 50 ethylene oxide units; ethoxylated 
fatty alcohols in which the alcohol contains from 12 to 20 carbon atoms and in which there are 15 
to 80 ethylene oxide units; fatty alcohols; fatty acids; and mono- and di- and triglycerides of fatty 
acids. Examples of film-forming coating materials suitable for application by fluid bed techniques 
are given in GB 1483591. Liquid enzyme preparations may. for instance, be stabilized by adding a 
polyol such as propylene glycol, a sugar or sugar alcohol, lactic acid or boric acid according to 
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established methods. Protected enzymes may be prepared according to the method disclosed In 
EP 238216. 

The detergent composition of the invention may be in any convenient form, e.g., a bar. a 
tablet, a powder, a granule, a paste or a liquid. A liquid detergent may be aqueous, typically 
5 containing up to 70 % water and 0-30 % organic solvent, or non-aqueous. 

The detergent composition comprises one or more surfactants, which may be non-ionic 
including semi-polar and/or anionic and/or cationic and/or zwitterionic. The surfactants are 
typically present at a level of from 0.1% to 60% by weight. 

When included therein the detergent will usually contain from about 1% to about 40% of an 
10 anionic surfactant such as linear alkylbenzenesulfonate, alpha-olefinsulfonate, alkyl sulfate (fatty 
alcohol sulfate), alcohol ethoxysulfate, secondary alkanesulfonate, alpha-sulfo fatty acid methyl 
ester, alkyl- or alkenylsuccinlc acid or soap. 

When included therein the detergent will usually contain from about 0.2% to about 40% of 
a non-Ionic surfactant such as alcohol ethoxylate. nonylphenol ethoxylate. alkylpolyglycoside. 
15 alkyldimethylamineoxide, ethoxylated fatty acid monoethanolamide, fatty acid monoethanolamide, 
polyhydroxy alkyl fatty acid amide, or N-acyl N-alkyl derivatives of glucosamine ("glucamides"). 

The detergent may contain 0-65 % of a detergent builder or complexing agent such as 
zeolite, diphosphate, triphosphate, phosphonate, carbonate, citrate, nitrilotriacetic acid, 
ethylenediaminetetraacetic acid, diethylenetriaminepentaacetic acid, alkyl- or alkenylsuccinic acid, 
20 soluble silicates or layered silicates (e.g. SKS-6 from Hoechst). 

The detergent may comprise one or more polymers. Examples are 
carboxymethylcellulose, poly(vinylpyrrolidone), poly (ethylene glycol), poly(vinyl alcohol), 
poly(vinylpyridine-N-oxide), poly(vinylimidazole). polycarboxylates such as polyacrylates, 
maleic/acrylic acid copolymers and lauryl methacrylate/acrylic acid copolymers. 
25 The detergent may contain a bleaching system which may comprise a H 2 0 2 source such 

as perborate or percarbonate which may be combined with a peracid-forming bleach activator 
such as tetraacetylethylenediamine or nonanoyloxybenzenesulfonate. Alternatively, the bleaching 
system may comprise peroxyacids of e.g. the amide, imide, or sulfone type. 

The enzyme(s) of the detergent composition of the invention may be stabilized using 
conventional stabilizing agents, e.g., a polyol such as propylene glycol or glycerol, a sugar or 
sugar alcohol, lactic acid, boric acid, or a boric acid derivative, e.g.. an aromatic borate ester, or a 
phenyl boronic acid derivative such as 4-formylphenyl boronic acid, and the composition may be 
formulated as described in e.g. WO 92/19709 and WO 92/19708. 

The detergent may also contain other conventional detergent ingredients such as e.g. 
35 fabric conditioners including clays, foam boosters, suds suppressors, anti-corrosion agents, soil- 
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suspending agents, anti-soil redeposition agents, dyes, bactericides, optical brighteners, 
hydrotropes, tarnish inhibitors, or perfumes. 

It is at present contemplated that in the detergent compositions any enzyme, in particular 
the enzyme of the invention, may be added in an amount corresponding to 0.01-100 mg of 
enzyme protein per liter of wash liqour, preferably 0.05*5 mg of enzyme protein per liter of wash 
liqour, in particular 0.1-1 mg of enzyme protein per liter of wash liqour. 

The enzyme of the invention may additionally be incorporated in the detergent formulations 
disclosed in WO 97/07202. 

The invention described and claimed herein is not to be limited in scope by the specific 
embodiments herein disclosed, since these embodiments are intended as illustrations of several 
aspects of the invention. Any equivalent embodiments are intended to be within the scope of this 
invention. Indeed, various modifications of the invention in addition to those shown and described 
herein will become apparent to those skilled in the art from the foregoing description. Such 
modifications are also intended to fall within the scope of the appended claims. In the case of 
conflict, the present disclosure including definitions will control. 

Various references are cited herein, the disclosures of which are incorporated by reference 
in their entireties. 

EXAMPLES 
Materials and methods 

Strains: 

Bacillus subtilis PL1801 (Diderichsen, B-et al. 1990. Cloning of aldB, which encodes 
alpha-acetolactate decarboxylase, an exoenzyme from Bacillus brevis. J. Bacteriol., 
172, 4315-4321) 
Bacillus subtilis MB1053 
Bacillus subtilis PL3598-37 
Bacillus subtilis MB1510 

Bacillus subtilis PL2306. This strain is the B.subtilis DN1885 with disrupted apr and npr 
genes (Diderichsen, B., Wedsted, U., Hedegaard, U Jensen, B. R., Sjoholm, C. (1990) 
Cloning of aldB, which encodes alpha-acetolactate decarboxylase, an exoenzyme from 
Bacillus brevis. J. Bacteriol., 172, 4315-4321) which is also disrupted in the 
transcriptional unit of the known Bacillus subtilis cellulase gene, resulting in cellulase 
negative cells. The disruption was performed essentially as described in (Eds. A.L. 
Sonenshein, J.A. Hoch and Richard Losick (1993) Bacillus subtilis and other Grarrv- 
Positive Bacteria, American Society for microbiology, p.618). 
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Procedure for isolating genomic DNA. 

Harvest 1 .5 ml culture and resuspend in 100 pi TEL. Leave at 37C for 30 min. 
Add 500 pi thiocynate buffer and leave at room temperature for 10 min. 
Add 250 pi NH4Ac and leave at ice for 1 0 min. 
Add 500 pi CIA and mix. 

Transfer to a microcentrifuge and spin for 1 0 min. at full speed. 

Transfer supernatant to a new Eppendorf tube and add 0.54 volume cold isopropanol. Mix 
thoroughly. 

Spin and wash the DNA pellet with 70 % EtOH. 
Resuspend the genomic DNA in 100 pi TER. 

TE: 10 mM Tris-HCI, pH 7,4 

1 mM EDTA, pH 8.0 
TEL: 50 mg/ml Lysozym in TE-buffer 

Thiocyanate: 5M guanidium thiocyanate 

100 mM EDTA 

0.6 % w/v N-laurylsarcosine, sodium salt. 
60 g thiocyanate, 20 ml 0.5 M EDTA. pH 8.0, 20 ml H20 
dissolves at 65C. Cool down to RT and add 0.6 g N- 
laurylsarcosine. Add H20 to 100 ml and filter it through a 0.2 p 
sterile filter. 

NH4Ac: 7.5 M CH3COONH4 

TER: 1 pg/ml Rnase A in TE-buffer 

CIA: Chloroform/isoamyl alcohol 24:1 

Purification of PCR bands and DNA seouencino 

PCR fragment can be purified using GFX™ PCR DNA and Gel Band™ Purification Kit 
(Pharmacia Biotech) according to the manufacturer's instructions. The nucleotide sequences of 
the amplified PCR fragments are determined on an ABI PRISM™ 3700 DNA Analyzer (Perkin 
Elmer, USA) using 50-100 ng as template, the Taq deoxy-terminal cycle sequencing kit (Perkin 
Elmer, USA), fluorescent labeled terminators and 5 pmol of the sequencing primer of choice. 



Media 



41 



TY: (As described in Ausubel, F. M. et al. (eds.) "Current protocols in Molecular Biology". John 
Wiley and Sons, 1995). 

LB agar (As described In Ausubel, F. M. et al. (eds.) "Current protocols in Molecular Biology". 
John Wiley and Sons, 1 995). 

LB-PG agar is LB agar supplemented with 0.5% Glucose and 0.05 M potassium phosphate, pH 
7.0. 

Proteolytic activity 

S2A protease activity is measured using the PNA assay with succinyl-alanine-alanine- 
proline-phenylalnine-paranitroanilide as a substrate unless otherwise mention. The principle of the 
PNA assay is described in Rothgeb, T.M., Goodlander. B.D., Garrison. P.H.. and Smith. LA, 
Journal of the American Oil Chemists* Society, Vol. 65 (5) pp. 806-810 (1988). 

Gene expression in Bacillus subtilis host 

All the expressed genes in the following examples are integrated by homologous 
recombination on the Bacillus subtilis host cell genome. The genes are expressed under the 
control of a triple promoter system (as described in WO 99/43835), consisting of the promoters 
from Bacillus licheniformis alpha-amylase gene (amyL), Bacillus amyloliquefaciens alpha-amylase 
gene (amyQ), and the Bacillus thuringiensis crylllA promoter including stabilizing sequence. The 
gene coding for Chloramphenicol acetyl-transferase was used as maker. (Described in eg. 
Diderichsen.B.; Poulsen,G.B.; Joergensen.S.T.; A useful cloning vector for Bacillus subtilis. 
Plasmid 30:312 (1993)). 

Example 1 

A synthetic 10R gene (10RS) encoding a S2A protease denoted 10R from Nocardiopsis 
sp. NRRL 18262 (WO 01/58276) was constructed which has the nucleotide sequence shown in 
SEQ ID NO: 1. This synthetic gene was fused by PCR in frame to the DNA coding for the signal 
peptide from SAVINASE™ (Novozymes) resulting in the coding sequence Sav-10RS which is 
shown in SEQ ID NO: 2. Several tail-variants of this construct were made. Compared to the Sav- 
10RS protease encoded by SEQ ID NO:2 the tail variant construct Sav-10RS HV0 was 
constructed to have 8 amino acids extra in the C-terminus: QSHVQSAP (SEQ ID NO: 3) which 
were encoded by the following DNA sequence extension inserted in front of the TAA stopcodon of 
SEQ ID NO: 2: 

(SEQ ID NO: 4): caatcgcatgttcaatccgctcca 
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Tail variant Sav-1 ORS HV1 was constructed to have 4 amino acids extra in the C- 
terminus: QSAP (SEQ ID NO: 5), with the following DNA sequence extension inserted in front of 
the TAA stopcodon: 

(SEQ ID NO: 6): caatcggctcct 

Tail variant Sav-1 ORS HV3 was constructed to have 2 amino acids extra in the C- 
terminus: QP (SEQ ID NO: 7) with the following DNA sequence extension inserted in front of the 
TAA stopcodon: 

(SEQ ID NO: 8): caacca 

Tail variant Sav-1 ORS HV2 was constructed to have one amino acid extra in the C- 
terminus: P (SEQ ID NO: 9) with the following DNA sequence extension inserted in front of the 
TAA stopcodon: 

(SEQ ID NO: 10): cca 

The 10RS gene and the four tail-variant encoding genes were integrated by homologous 
recombination into the Bacillus subtilis MB1053 host ceil genome. Chloramphenicol resistant 
transformants were checked for protease activity on 1% skim milk LB-PG agar plates 
(supplemented with 6 ug/ml chloramphenicol). Some protease positive colonies were further 
analyzed by DNA sequencing of the insert to ensure the correct gene DNA sequence, and five 
strains, each comprising one of the above constructs, were selected and denoted, respectively: 
B.subtilis Sav-1 ORS. B.subtilis Sav-1 ORS HVO, B.subtffis Sav-1 ORS HV1, B.subtilis Sav-1 ORS 
HV2 and B.subtilis Sav-1 ORS HV3. 

Example 2 

Fermentations for the production of the tail-variant enzymes of the invention were 
performed on a rotary shaking table in 500 ml baffled Erienmeyer flasks each containing 100 ml 
TY supplemented with 6 mg/l chloramphenicol. 

Six Erienmeyer flasks for each of the five S. subtilis strains from example 1 were 
fermented in parallel. Two of the six Erienmeyer flasks were incubated at 37°C (250 rpm). two at 
30°C (250 rpm), and the last two at 26°C (250 rpm). A sample was taken from each shake flask at 
day 1, 2 and 3 and analyzed for proteolytic activity. The results are shown in tables 1-3. As it can 
be seen from tables 1 -3, the effect of the tails is a surprisingly high improvement on the 
expression level of the protease, as measured by activity in the culture broth. The effect is most 
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pronounced at 26°C and 30°C, but is also evident at 37°C as an effect observed especially at the 
early stage of the fermentation. 



Table 1: Relative proteolytic activities at 37°C. 
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Day 1 


Day 2 


Day 3 


Sav-10RS 


-1.0 


1.0 


1,0 


Sav-10RS HVO 


3.3 


0.7 


0,8 


Sav-10RS HV1 


*7 


1.3 


1.2 


Sav-IORS HV2 


2.2 


0,6 


0,4 


Sav-10RS HV3 


5.3 


1.4 


1,7 


Table 2: Relative proteolytic activities at 30°C. 






Day 1 


Day 2 


Day 3 


Sav-10RS 


1.0 


1.0 


1,0 


Sav-IORS HVO 


1,7 


2,2 


2.9 


Sav-10RS HV1 


4,6 


3.1 


4.9 . 


Sav-IORS HV2 


. 2 T 4 


1.9 


2.3 


Sav-10RS HV3 


4.8 


3.0 


4,4 . 


Table 3: Relative proteolytic activities at 26°C. 






Day 1 


Day 2 


Day 3 


Sav-10RS 


1.0 


1.0 


1.0 


Sav-10RS HVO 


1.8 


2.5 


3.1 


Sav-IORS HV1 


2.5 


3.6 


4.3 


Sav-IORS HV2 


1.8 


2.6 


2,8 


Sav-10RS HV3 


2.6 


3.5 


4.6 



Example 3 

The following construct was used for the chromosomal integration of the tail-variant 
encoding genes. The coding sequence of the well-known subtilisin BPN' protease was 
operationally linked to a triple promoter, a marker gene was fused to this (a spectinomycin 
resistance gene surrounded by resolvase res-sites), and pectate lyase encoding genes from 
Bacillus subtllls were fused to the construct as flanking segments comprising the 5' polynucleotide 
region upstream [yfmD-ytmC-yfmB-yfmA-Pel-start], and the 3" polynucleotide region downstream 
IPel-end-yflS-citS(start)] of the fail-variant encoding polynucleotide, respectively. The integrational 
cassette was made by the joining of several different PCR fragments. After the final PCR reaction 
the PCR product was used for transformation of naturally competent B. subtilis cells. One clone 
denoted PL3598-37 was selected and confirmed by sequencing to contain the correct construct 



The PL3598-37 clone thus contains the following: 
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1 . The flanking regions 100% homologous to region of the B.subtilis genome (appears as the 
upstream fragment yfmD-ytmC-yfmB-yfmA-Petstart and the downstream fragment Pel-end- 
yflS-citS(start)). 

2. The Spectinomycin resistance gene flanked by Resolvase sites (res). 

3. The triple promoter region plus CrylllA mRNA stabilising leader sequence. 

4. The BPN' Open Reading Frame. 

Construction of triple promot er BPN 1 cassette 

A PCR fragment comprising the integrational cassette for a BPN' library was 
constructed, thus operably linking a triple promoter (as described in WO 99/43835; Novozymes) to 
a BPN' expression cassette from a Bacillus strain. The triple promoter is a fusion of an optimized 
Bacillus amyL-derived promoter (as shown in WO 93/10249; Novozymes) with two promoters 
scBAN and crylllA, where the first is a consensus version of the Bacillus amyloliquefaciens 
amylase BAN promoter, and the latter Includes a rnRNA-stabilising sequence (as described in WO 
99/43835; Novozymes). Suitable primers can be derived from the publicly available sequences 
(Vasantha, N. et al. Genes for alkaline protease and neutral protease from Bacillus 
amyloliquefaciens contain a large open reading frame between the regions coding for signal 
sequence and mature protein. J. Bacterid. 159:811 (1984) EMBL: accession No. K02496). A Kpnl 
and a Sail restriction site was introduced to flank the PCR fragment at each end, using the 
primers: 

#252639 (SEQ ID NO: 11): catgtgcatgtgggtaccgcaacgttcgcagatgctgctgaagag 
#251992 (SEQ ID NO: 12): catgtgcatgtggtcgaccgattatggagcggattgaacatgcg 

The Kpnl and Sail restriction sites in the PCR fragment were subsequently used to clone 
the fragment into a Kpnl-Sall digested Peel-Spec PCR fragment The Peel-Spec fragment 
comprises a Spectinomycin resistance gene inserted in the middle of the B.subtilis Pectate lyase 
gene plus approx. 2.3 kb of upstream genomic DNA and approx. 1.7 kb downstream genomic 
DNA. The Peel-Spec fragment was produced by PCR amplification of genomic DNA from the 
B.subtilis strain MB1053. using the primers: 

#179541 (SEQ ID NO: 13): gcgtlgagacgcgcggccgcgagcgccgtttggcAgaatgatac 
#179542 (SEQ ID NO: 14): gcgttgagacagctcgagcagggaaaaatggaaccgctttttc 

Construction of MBinra 
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The MB1053 B.subtilis strain was constructed by deletion of the pectatelyase (Pel) gene 
through integration of a PCR product into a wild-type B.subtilis typestrain genome. This was 
achieved by a PCR amplification of genomic DNA directly downstream and upstream of the 
Pectate lyase gene of the B.subtilis. 

The ends of the genomic DNA directly preceding and proceeding the Pel gene were 
elongated through primer insertion of sequences being 100% homologous to DNA sequences 
defined by the ends of a third PCR fragment encoding a marker gene surrounded by Resolvase 
(Res) sites. In this particular case the marker gene (Spec) conferred resistance to spectinomycin, 
and it was situated between two Res sites, altogether present on the plasmid pSJ3358 (described 
In US patent No. 5,882,888). Three different PCR fragments were Initially produced. 

Fragment 1: this fragment covers from the yfrnD gene to the middle of the Pel gene and 
introduces an overhang to the Res-Spec-Res cassette at the Pel gene. The size of fragment 1 is 
2.8 kb. The fragment was produced by a PCR amplification chromosomal DNA from the B.subtilis 
strain PL.2306, using the primers: 



#179541 (SEQ ID NO: 13), and 

#179539 with overlap to #179154 Spec primer (SEQ ID NO: 15): 

itcc 




Fragment 2: this fragment covers from the middle of the Pel gene to after the end of the CitS gene 
and introducing an overhang to the Res-Spec-Res cassette at the middle of the Pel gene. The 
size of fragment 2 is 2.3 kb. The fragment was produced by a PCR amplification of chromosomal 
DNA from the B.subtilis strain PL2308, using the primers: 

#179542 (SEQ ID NO: 14), and 

#179540 with overlap to #1791 53 Spec primer (SEQ ID NO: 16): 



Fragment 3: this fragment contains the Spectinomycin gene surrounded by Res sites and DNA 
sequences in the ends overlapping with PCR fragment 1 and 2. The size of fragment 3 is 1.6 kb. 
Fragment 3 was produced by PCR amplification of plasmid pSJ3358, using the primers: 

#179154 (SEQ ID NO: 17): gttgtaaaacgacggccagtgaattctgatcaaatgg 
#179153 (SEQ ID NO: 18): ccgcgtcgacactagacacgggtacctgatctagatc 
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Standard conditions for the PCR reaction 

For the PCR amplifications of fragment 1-3 the HiFi Expand™ PCR system (Roche) was 
used together with the following cycling scheme: 
5 5 {Jl Buffer 2 

14 pi dNTP's (1 .25 mM each) 
2.5 ud 20 pM primer 1 

2.5 pi 20pM primer 2 
x pi water 

10 To this mix 3 pi of DNA (apx. 100 ng) and 0.75 p) Enzyme mix (use hot start) is added. 
Total volume is 50 pi. 
The cycling profile is: 

1 cycle of 120 sec at 94X 

Break. 

15 10 cycles of 15 sec at 94'C, 60 sec at 60 # C, 240 sec at 72'C. 

20 cycles of 15 sec at 94'C, 60 sec at 60'C, (180 sec at 72'C add 20 sec pr cycle) 
1 cycle 600 sec at 68*C. 

The three PCR fragments were made and joined in later JOINING-PCR reactions. The three PCR 
20 fragments were single sharp bands and no gel purification was necessary. Only Qiagen™ PCR 
purification was performed prior to the following JOINING-PCR. 
JOINING of fragment 1+3 (same procedure for fragment 2 + 3): 
5 pi Buffer 2 

8 pi dNTP's (1.25 mM each) 
25 5.0 pi Fragment 3 

5.0 pi Fragment 1 
9,25 pi water 

1 cycle of 120 sec at 94 # C. 
30 Break. Add Enzyme 

10 cycles of 1 5 sec at 94'C, 60 sec at 60 # C, 240 sec at 72'C. 
Break. Add Primers 

15 cycles of 15 sec at 94"C. 60 sec at 60'C ( (180 sec at 72'C add 20 sec pr cycle) 
1 cycle 600 secat68 # C. 
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After the first cycle at 94°C for 120 sec there is a break, where 0.75 pi Enzyme mix is added. 
Total volume is now 45.0 pi. . . 

After the initial 10 cycles, there is another break in the cycling and for fragment 1+3: 2.5 pi (20pM 
#179541) and 2.5 pi (20 pM #179153) are added and for fragment 2+3: 2.5 pi (20pM #179542) 
and 2.5 pi (20 pM #179154) are added and the cycling is continued for 15 cycles more. 

The PCR products were then gel purified: The size of fragment 1+3 should be 3.4 kb 
and the size of fragment 2+3 should be 3.4 kb. These two fragments were joined in a last PCR 
reaction (Expand™ long system, Roche): 

5 pi Buffer 1 

14 pt dNTP's (1.25 mM each) 
5.0 pi Fragment 1+3 

5.0 pi Fragment 2+3 
17.75 pi water 

After the first cycle at 94°C for 120 sec there is a break, where 0.75 pi Enzyme mix is added. 
Total volume is now 45.0 pi. 

After the initial 10 cycles, there is another break in the cycling and 2.5 pi (20pM #179541) and 2.5 
pi (20 pM #179542) is added and the cycling is continued for 15 cycles more. 

1 cycle of 120 sec at 94'C. 
Break. Add Enzyme 

10 cycles of 1 5 sec at 94'C, 60 sec at 60'C, 240 sec at 68*C. 
Break. Add Primers 

15 cycles of 1 5 sec at 94'C. 60 sec at 60'C. 180 sec at 68*C add 20 sec pr cycle 
1 cycle 600 sec at 68*C. 

The size of the joined PCR fragment is 6.8 kb. This PCR fragment was purified using a 
Qiagen™ PCR purification kit, and 5 pi of the 50 pi eluted DNA was used to transform a standard 
Bsubtilis strain. After transformation cells were spread onto LBPG-120pg/ml of spectinomycln. 
Next day more than 1000 colonies were seen. 8 of these were checked using PCR primers from 
last JOINING PCR amplification yielding PCR fragment of 6.8 kb rather than the 5.2 kb expected if 
deletion had not occurred. Furthermore, the pectatelyase activity of the clones was checked with 
the Mancini Immunoassay, which showed no reactivity towards the pectatelyase activity. This 
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taken together with the Spec resistance tells us that deletion had occurred. One such clone was 
selected and denoted MB1053. 

Insertion of BPN' expression cassette adjacent to the res-soec-res in MB1053 

The ligation mix of the digested PCR amplified triple promoter BPN' expression cassette 
and the Kpnl-Sal digested Peel-Spec PCR fragment was used as template In a PCR amplification 
using the PCR primers #179541 and #179542. This resulted in a PCR fragment of approx. 9 kb, 
which was used to transform B.subtilis PL1801 (Diderichsen, B et al. 1990. Cloning of aldB, which 
encodes alpha-acetolactate decarboxylase, an exoenzyme from Bacillus brevis. J. Bacterid., 172, 
4315-4321) competent cells. The transformed cells were plated on LB-120 ug/ml Spectinomycin 
agar plates with skim milk. Spectinomycin resistant colonies with large skim milk clearing zones 
were restreaked on Spectinomycin agar plates and analysed for the integration of the PCR 
fragment with PCR using the primers #179541 (SEQ ID NO: 13) and #179542 (SEQ ID NO: 14). 

Appearance of a 9 kb fragment indicates that the PCR fragment has been integrated 
into the host cell genome. Several of these clones were sequenced to confirm integration of the 
expression cassette, one such clone was selected and denoted PL3598-37. 

Example 4 

An E.co// plasmid-borne integrational cassette for a library may be constructed In vivo. 
An integration cassette to be used according to the method of the invention may be present on a 
E.coli plasmid (which is capable only of replication in E.co//, not In B.subtilis), the plasmid 
comprising: 

0 The DNA sequence encoding the Pre-Pro-domains of the subtilisin protease 
commonly known as Savinase, preceded by and operably linked to 

ii) a DNA sequence comprising a mRNA stabilising segment derived in this particular 
case from the Cryllla gene; 

HO a marker gene (a chloramphenicol resistance gene), and 

iv) genomic DNA from Bacillus subtilis as 5' and 3' flanking segments: The homologous 
5' polynucleotide region upstream of the polynucleotide lyfmD-ytmC-yfmB-yftnA-Pel-startl, and 
the 3* polynucleotide region downstream of the polynucleotide r.Pel-end-yflS-citS<start)l. 
respectively. 

The cassette was made by several cloning steps involving digestion of pUC19 plasmid 
and PCR fragments with appropriate restriction endonuclease sites of several different PCR 
fragments in the generally used plasmid pUC19. After each ligation of a PCR fragment into a 
plasmid, the ligation mixture was transformed into electrocompetent DH5alpha E.co// cells that 
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were prepared for and transformed by electroporation using a Gene Pulser™ electroporator from 
BIO-RAO as described by the supplier. One final plasmid construct was confirmed by sequencing 
to contain the correct construct as outlined above, and it was denoted pMB1508. 

The pMB1508 plasmid thus contains the following: 

i) The CrylllA mRNA stabilising leader sequence including a ribosome binding sequence 
(RBS). operationally linked to 

ii) DNA encoding the Pre-Pro-domains of the subtilisin commonly known as Savinase, 
including Kpnl and Notl sites for cloning; 

Hi) The chloramphenicol resistance operon; 

iv) The 3* downstream flanking region (Pel-end-yflS-citS(start)l which is 99-100% 
homologous to the region of the B.subtitis. 

The four elements listed were cloned in the pUC19 vector (Isolated from E.coH AJCC 
37254; Vleira J, Messing J. The pUC plasmids, an M13mp7-derived system for insertion 
mutagenesis and sequencing with synthetic universal primers. Gene 19: 259-268. 1982.) in the 
EcoRI and Sail sites to give pMB1508. In order for the resulting plasmid to integrate effeciently to 
a specified site of th B.subtUis genome, a new strain was established. The new strain is a 
derivative of Bacillus subtilis 168 BGSC accession number. 1A1 168 trpC2 . The strain was made 
competent and transformed as described above. Using elements from the PL3598-37 clone 
described above, the new integration strain denoted MB1510 was established and characterised 
to contain the following elements from PL3598-37: 

i) The triple promoter and the mRNA stabilising element. 

ii) Flanking segments comprising the following homologous polynucleotide region JyfmD- 
ytmC-yfmB-yfmA-Pel-start] upstream of the triple-promoter, and the polynucleotide region [Pel- 
end-yflS-citS(start)l downstream of the mRNA stabilizing element. 

Thus, when using MB1510 competent cells, it is possible for the pMB1508 (or 
derivatives thereof) to directly integrate into the genome of MB1510 where the two flanking 
regions in fusion with the triple-promoter and mRNA stabilising element is located, resulting in a 
construction where the incoming PrePro encoding DNA of pMB1508DNA has been integrated in 
the correct reading frame with the tripel-promoter, the mRNA stabilising element and the RBS. 
Thus resulting in high expression of the integrated gene from the promoter elements already 
present on the genome of MB1510. 
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Transformation efficiency was established for the B.subtilis strain MB1510 transformed 
with E.coli prepared plasmid pMB1508. For further testing of the potential of using this approach, 
the Savinase encoding gene of Bacillus clausii was PCR amplified using the two PCR primers: 

Primer #31 7 (SEQ ID NO: 19) tggcgcaatcggtaccatgggg 

Primer #139 Notl (SEQ ID NO: 20) catgtgcatgcggccgcattaacgco.ttgccgcttctgcg 

The resulting -0.8 kb of the Savinase fragment and the pMB1508 plasmid are digested 
with Kpnl and Notl, and the resulting fragments are then purifiied by agarose gel electrophoresis. 
The two fragments are ligated, and the ligation mixture is used to transform competent E. co/i ceils 
which are then plated on LB-agar plates or placed in liquid media for growth overnight at 37°C; 
both types of media containing 50-100ug/ml of Ampicillin. After incubation, a plasmid prep is made 
of the liquid culture. The purified plasmid is used for transformation of competent cells of MB1510 
(using 100-10.000 ng of plasmid per transformation. The transformed cells are plated onto TY 
medium with 2% skimmilk and 6 ug/ml of chloramphenicol for selection. After overnight incubation 
at 37°C clearing zones appear around those colonies wherein the integration cassette is 
integrated properly into the cells, indicating high Savinase expression. 

This approach can also be used to make highly diverse libraries of any gene of interest 
expressable in B.subtilis, where rather than a gene encoding one enzyme, any expressable 
polynucleotide is inserted into the plasmid pMB1508 and integrated into the MB1510 strain for 
subsequent screening. 

Sequence of plasmid PMB1508 (SEQ ID NO: 211 

The plasmid pMB1508 has the following components, indicated by basepair positions: 

BP 5186-395: pUC19 sequence from £.co/i clone ATCC 37254, Vieira J, Messing J. The 
pUC plasmids, an M13mp7-derived system for insertion mutagenesis and sequencing with 
synthetic universal primers. Gene 19: 259-268, 1982. 

BP 396-1021: EcoR I cloning site (BP396-401) and the CryHIA mRNA stabilising 
element. (Described in WO 9634963-A1) 

BP 1022-1412: Encodes the Pre-Pro sequence of Savinase and the Won" cloning site. 
(Pre-Pro part described in eg. WO 9623073-A1, the Afofl site and the spacing between the Pre- 
Pro and Notl was introduced by the PCR primer. 

BP 1413-2512: The Bgl II cloning site (BP1413-1418) and the Chloramphenicol acetyl- 
transferase operon of pDN1050 (Described in eg. Diderichsen.B.; Poulsen.G.B.; Joergensen.S.T.; 
A useful cloning vector for Bacillus subtilis. Plasmid 30:312 (1993)). 
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BP 2513-5185: The polynucleotide region [Pel-end-yflS-citS(start)] downstream of the 
pe!B locus of the B.subtilis genome, (as it appeaars from the publication and corresponding 
database of: F. Kunst, N. Ogasawara, I. Moszer, <146 other authors>. H. Yoshikawa, A. Danchin. 
'The complete genome sequence of the Gram-positive bacterium Bacillus subtilis" 
Nature (1997) 390:249-256). 

The Bacillus subtilis strain MB1510 

MB1510 has the following specific features in and around the pelB locus: 

i) The triple promoter and the mRNA stabilising element including a RBS (Ribosome binding 
sequence). 

ii) Ranking segments comprising the following homologous polynucleotide region [yfmD-ytmC- 
yfmB-yfmA-Pei-start] upstream of the triple-promoter, and the polynucleotide region IPel-end-yflS- 
citS(start)] downstream of the mRNA stabilizing sequence. 

Sequence of MB1510 genomic integration region (SEQ ID NO: 22) 

BP 1-2873: corresponds to sequence of Bacillus subtilis genome yfmD-ytmC-yfmB- 
yfmA-Pel-start (as it appeaars from the publication and corresponding database of: F. Kunst et al. 
"The complete genome sequence of the Gram-positive bacterium Bacillus subtilis" 
Nature (1997) 390:249-256). 

BP 3102-4082: The triple promoter and CrylllA mRNA stabilising element plus RBS. 
(Described above in PL3598-37 construct). 

BP 4083-5718: The polynucleotide region [Pel-end-yflS-citS(start)J end of and 
downstream of the pelB locus of the B.subtilis genome (as it appeaars from the publication and 
corresponding database of: F. Kunst, N. Ogasawara. I. Moszer, <146 other authors>, H. 
Yoshikawa, A. Danchin. 'The complete genome sequence of the Gram-positive bacterium Bacillus 
subtilis" Nature (1997) 390:249-256). 

Example 5. 

Another tail-variant library was constructed. In this library two amino acids were 
introduced at the C-terminal of the 10R protein. Such a Tail-library may be made with the method 
described above using the following PCR primers in a PCR reaction using genomic DNA from 
B.subtilis 10RS as template: 

1605 (SEQ ID NO: 23): gacggccagtgaattcgataaaagtgc 

1606 (SEQ ID NO: 24): ccagatctctatnktnktgtacggagtctaactccccaagag 
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wherein N = A, C, G or T; and K - T or G. 

The resulting PCR product was digested with EcoR I and Bgl II and ligated into EcoR I 
5 and Bgl II digested pMB1508. Hereafter following the principle described above. 

Chloramphenicol resistant Bacillus subtilis transformants were picked by a robotic 
colony picker from a bioassay plate and transferred into a 384 well microtiter plate (MTP) 
containing 0.05 X TY supplemented with 6 mg/l chloramphenicol (60ul/well). The MTPs were 
10 incubated at 26°C for 72h. After incubation each well was analyzed for proteolytic activity. 

The thirty Bacillus subtilis transformants with highest proteolytic activity were selected 
for determination of the two tail amino acids in each transformant by DNA sequencing, the 
sequencing results are summaries in table 4 and table 5. 



AATail 


No. of transformants 


TL 


4 


TT 


4 


QL 


3 


TP 


3 


LP 


3 


Tl 


2 


IQ 


2 


QP 


2 


PI 


2 


LT 




TQ 




IT 




QQ 




PQ 




Total 


30 



Table 4: column one shows the amino acid sequence of the tail, and column two shows the 
number of Bacillus subtilis transformants sequenced with that particular AA tail sequence. 
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Possibilities oosition 1 




rossiuiitues position Z 


Result 


K 


o 

u 




0 


R 


0 


R 


n 

V 


T 


14 


T 


6 


1 


3 


1 


4 


Q 


6 


Q 


5 


P 


3 


P 


8 


L 


4 


L 


7 


Total 


30 


Total 


30 



Table 5: The table shows the amino acid which could be introduced by the primer used for the 
library construct and the actual findings by DNA sequencing of the thirty colonies isolated from 
screening. 



Example 6 

Construction of Bacillus subtilis strains L2. L2 HVQ. L2 HV1 

A Bacillus subtilis strain was made analogously with the construction of the Bacillus 
subtilis strain 10RS, with the DNA coding for the protorm of the S2A protease from Nocardiopsis 
dassonvillai subsp. Dassonv//tei DSM 43235. denoted L2, fused by PCR in frame to the DNA 
coding for the signal peptide from SAVINASE™ (a well-known commercial protease derived from 
Bacillus clausii, available from Novozymes, Denmark), the resulting strain was denoted Bacillus 
subtilis Sav-L2. 

The DNA sequence including the coding region for the pro-mature S2A protease from 
Nocardiopsis dassonvillei subsp. DassonwY/ei DSM 43235. as amplified with primers 1423 and 
1475. is shown in SEQ ID NO: 25. The corresponding encoded pro-form amino acid sequence for 
the L2 protease is shown in SEQ ID NO: 28. 

1423 (SEQ ID NO: 26): gcttttagttcatcgatcgcatcggctgctccggcccccgtcccccag 
1475 (SEQ ID NO: 27): ggagcggattgaacatgcgattaggtccggatcctgacaccccag 

Two tail-variants of mis construct were also made. Tail variant Sav-L2 HVO was 
constructed to have 8 amino acids extra in the C-terminus: QSHVQSAP (SEQ ID NO: 3). by using 
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the DMA sequence extension inserted in front of the TAA stopcodon which is shown in SEQ ID 
NO: 4. Tail variant Sav-L2 HV1 was constructed to have 4 amino acids extra in the C-terminus: 
QSAP (SEQ ID NO: 5), by using the DMA sequence extension inserted in front of the TAA 
stopcodon which is shown in SEQ ID NO: 6. Both tail variants had the SAVINASE™ signal- 
peptide encoding sequence fused in frame with the pro-mature encoding sequence, just like in 
Sav-L2. 

The Sav-U gene and the two tail-variants Sav-L2 HVO and Sav-L2 HV1 were integrated 
by homologous recombination on the Bacillus subtilis MB1053 host cell genome as outlined 
above. Chloramphenicol resistant transformants were checked for protease activity on 1% skim 
milk LB-PG agar plates (supplemented with 6 ug/ml chloramphenicol). Some protease positive 
colonies were further analyzed by DMA sequencing of the insert to confirm the correct DNA 
sequence, and one strain for each construct was selected and denoted B.subtilis Sav-L2 
B.subtilis Sav-U HVO. and B.subtilis Sav-L2 HV1, respectively. 

15 Example 7. 

The three B. subtilis strains of example 6, were fermented on a rotary shaking table in 
500 ml baffled Erlenmeyer flasks containing 100 ml TY supplemented with 6 mg/l 
chloramphenicol. Six Erlenmeyer flasks for each of the three B. subtilis strains were fermented in 
parallel. Two of the six Erlenmeyer flasks were incubated at 37°C (250 rpm), two at 30"C (250 
rpm). and the last two at 26"C (250 rpm). A sample was taken from each shake flask at day 1, 2 
and 3 and analyzed for proteolytic activity. The results are shown in tables 6-8. As it can be seen 
from tables 6-8, the effect of the tails also increases the expression level for the Sav-L2 protease 
from Nocardiopsis dassonvillei subsp. Dassonvillel DSM 43235 when expressed in B. subtilis. An 
increase of up to 40% is observed in this experiment, but overall improvement is observed for both 
25 tail-variants at all three temperatures tested. 
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Table 6. Relative proteolytic activities at 37°c 





Day1 


Day 2 


Pav3 


Sav-Lz 




1.0 


1,0 


Sav-L2 HV1 


1.4 


1.3 


1.2 


Sav-L2 HVO 


1.3 


1.1 


1.4 


Table 7. Relative proteolytic activities at 30°c. 




Day 1 


Day 2 


Day 3 


Sav-L2 


1.0 


1,0 


1.0 


Sav-L2 HV1 


1,0 


1,2 




Sav-L2 HVO 


1.1. 


1,3 


1.3 . 
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Table 8. Relative proteolytic activities at 26'C. 





Day1 


Day 2 


Day 3 


Sav-L2 


1.0 


1,0 


1.0 


Sav-L2 HV1 


1.3 


- 1,1 


1.1 


Sav-L2 HVO 


0.2 


1.1 


1,1 



Example 8 

The DNA sequence coding for the pro-region from the L2 protease from Nocardiopsis 
dassonvillei subsp. Dassonvillel, DSM 43235 is shown in SEQ ID NO: 29, and the corresponding 
amino acid sequence is shown in SEQ ID NO: 30. A Bacillus subtilis strain denoted L210R. similar 
to the Bacillus subtilis strain 10RS, but with the DNA coding for the pro-region of the L2 replacing 
the pro-region of 10RS, was made. The entire L210R protease encoding sequence incl. the pro- 
region of L2, is shown in SEQ ID NO: 31. 

Two tail variants of the above construct were also made. Tail variant HVO was 
constructed to have 8 amino acids extra in the C-temiinus: QSHVQSAP (SEQ ID NO: 3) with the 
DNA shown in SEQ ID NO: 4 inserted in front of the TAA stopcodon of the encoding sequence. 
Tail variant HV1 was constructed to have 4 amino acids extra in the C-terminus: QSAP (SEQ ID 
NO: 5) with the DNA sequence shown in SEQ ID NO: 6 inserted in front of the TAA stopcodon of 
the encoding sequence. 

The 10RL2) construct and the two tail variants were integrated by homologous 
recombination on the Bacillus subtilis MB1053 host cell genome. Chloramphenicol resistant 
transformants were checked for protease activity on 1% skim milk LB-PG agar plates 
(supplemented with 6 ug/ml chloramphenicol). Some protease positive colonies were further 
analyzed by DNA sequencing of the insert to confirm the correct DNA sequence, and a strain for 
each construct was selected, and denoted B.subtilis L210R, B.subtilis L210R HVO, and B.subtilis 
L210R HV1, respectively. 



Example 9 

The six B. subtilis strains 10RS, 10RS HVO, 10RS HV1, L210R, L210R HVO, and L210R 
HV1 , were fermented on a rotary shaking table in 500 ml baffled Erienmeyer flasks containing 100 
ml TY supplemented with 6 mg/l chloramphenicol. Six Erienmeyer flasks for each of the B. subtilis 
strains were fermented in parallel. Two of the six Erienmeyer flasks were incubated at 37*C (250 
rpm), two at 30»C (250 rpm), and the last two at 26°C (250 rpm). A sample was taken from each 
shake flask at day 1, 2 and 3 and analyzed for proteolytic activity. The results are shown in figure 
1, and in tables 9-11. As it can be seen from the results, the effect of the exchange of the 
proregion from 10R with the proregion from the L2 protease resulted in a surprisingly high 
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improvement on the expression level of the 10R protease as measured by proteolytic activity in 
the culture broth at 37'C. The effect is most pronounced in the two tail variants. 

Table 9. Relative proteolytic activities at 37°c. 





Day 1 


Day 2 


Dav 3 


10RS 


.- 1,0 


1.0 


1.0 


10RS HVO 


3.7 


. 8.9 


3,5 


10RS HV1 


3,9 


8.5 


4,3 


L210R 


_ 1.8 


. 2.3 ._ 


1.6 


L210R HVO 

L210R HV1 ' 


5.3 


14.4 


7.3 


Table 10. Relative prote< 


9.1 

Jlytic activ 
Day 1 


20.9 

•ties at 30' 
Day 2 


7.6 

'C. 

_ Day 3 


10RS 


1.0 


1.0 


1.0 


10RS HVO ~~ 


2.8 


3.1 


4.3 


10RS HV1 


3.6 


3.6 


4.9 


L210R 


0.6 


0.4 


■ °,9 . 


L210R HVO 


3.5 


3.2 


4.5 


L210R HV1 


3,7 


3.2 


4,5 


Table 11. Relative proteolytic activities at 26° 


C. 




Day 1 


Day 2 


Day 3 


10R8 


_ 1.0 


1.0 


1,0 


10RS HVO 


- 2,6 _ 


3.0 


2.8 


10RS HV1 


3.7 


3.3 


3.1 


L210R 


0,4 .. 


0.7 


0.4 


L210R HVO 


2.3 


2.1 


1.9 


L210RHV1 


2.2 


. 1.7 


1.7 



Example 10 

^ h *°7T ' y ana, ° 90US,y With the abwe —"P** 1 through 9. similar experiments are 
carried out with the proteases of the following Nocardlopsis strains: 

(a) Nocatdiopsis dassonvWei NRRL 18133 as described in WO 88/03947- 

(b) Nocartiopsis S p. NRRL 18262 as described in WO 88/03947, the DMA and amino acid 
sequences of the protease derived from Nocardiosis sp. NRRL 18262 are shown in DK 
patent application no. 1996 00013, and WO 01/58276 describes the use in animal feed of 
a | C '™ We proteases re,a ** to the protease derived from Nocardiopsis sp. NRRL no. 18262- 

(c) Nocardtopsis Alba DSM 15647; the amino acid sequence of the protease is SEQ ID NO- 34 

dnaIT 9 nUde ° tide S6qUenCe *" S SEQ ' D N0: 33; the 9 ene is lso ' ated fr ™ *° gnomic 

* th,s **» b V PCR-amplification using the two primers: 
1421 (SEQ ID NO: 35): gttcatcgatcgcatcggc^^ 
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1604 (SEQ ID NO: 36): gcggatcctatcaggtgcgcagggtcagacc. 

(d) Nocarxiiopsis prasina DSM 15648; the amino add sequence of the protease is SEQ ID NO: 
38, the encoding nucleotide sequence is SEQ ID NQ: 37; the gene is isolated from the 
genomic DNA of this strain by PCR-amplification using the two primers: 

1346 (SEQ ID NO: 39): gttcatcgatcgcatcggctgccaccggaccgctcccccagtc 

1602 (SEQ ID NO: 40): gcggatcctattaggtccggagacggacgccccaggag. 

(e) Nocarxiiopsis pmsina DSM 15649; the amino acid sequence of the protease is SEQ ID NO: 
42, the encoding nucleotide sequence is SEQ ID NO: 41; the gene is isolated from the 
genomic DNA of this strain by PCR-amplification using the two primers: 

1603 (SEQ ID NO: 43): gttcatcgatcgcatcggctgccaccggaccactcccccagtc, and 1602 (SEQ ID 
NO: 40). 

Example 11 

The performance of the Nocarxiiopsis dassonvillei subspecies dassonvillei DSM 43235 
protease assayed in a monogastric in vitro digestion model. The performance of a purified 
preparation of the mature part of the protease having SEQ ID NO: 28 (prepared as described 
above) was tested in an in vitro model simulating the digestion in monogastric animals. In 
particular, the protease was tested for its ability to improve solubilisation and digestion of maize/- 
SBM (maizeAsoybean meal) proteins. In the tables below, this protease is designated "protease of 
the invention." 

The in vitro system consisted of 15 flasks in which maize/-SBM substrate was initially 
incubated with HCVpepsin - simulating gastric digestion - and subsequently with pancreatin - 
simulating intestinal digestion. 10 of the flasks were dosed with the protease at the start of the 
gastric phase whereas the remaining flasks served as blanks. At the end of the intestinal 
incubation phase samples of in vitro digesta were removed and analysed for solubilised and 
digested protein. 



Table 12: Outline of in vitr o digestion procedure 



Components added 


PH 


Temperature 


Time 
course 


Simulated digestion 
phase 


10 g maizeASBM substrate 
(6:4), 41 ml HCI(0.105M) 


3.0 


40°C 


t=0 min 


Mixing 


5 ml HCI(0.105M)/ pepsin 
(3000 U/g substrate), 1 mL 
protease of the invention 


3.0 


40°C 


t=30 min 


Gastric digestion 
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16 ml H2O " 


3.0 


40°C 


t=1.0 hour 


Gastric digestion 


7 ml NaOH (0.39M) 


6.8 


40°C 


t=1.5 hours 


Intestinal digestion 


"5 ml NaHC0 3 (1M) T 
pancreatin (8mg/gdiet) 


6.8 


40°C 


t=2.0 hours 


Intestinal digestion 


Terminate incubation 


7.0 


40°C 


t=6.0 hours 





Conditions 
Substrate: 
PH: 
HCI: 
pepsin: 
pancreatin: 
temperature: 40 8 C. 
Replicates: 5 



4 g SBM. 6 g maize (premixed) 
3.0 stomach step/ 6.8-7.0 intestinal step 
0.105 M for 1.5 hours (i.e. 30 mln HCI-substrate premixing) 
3000 U/g diet fori hour 
8 mg/g diet for 4 hours 



Solutions 
0.39 M NaOH 
0.105 M HCI 

0. 105 M HCI containing 6000 U pepsin per 5 ml 

1 M NaHCOa containing 16 mg pancreatin per ml 
1 25 mM NaAc-buffer, pH 6.0 

Enzvme p rotein determinatinnc 

The amount of protease enzyme protein fin what follows. Enzyme Protein is abbreviated 
EP) is calculated on the basis of the values and the amino acid sequences (amino acid 
compositions) using the principles outlined in S.C.Gill & P.H. von Hippel, Analytical Biochemistry 
182, 319-326, (1989). 

Experimental prof*»rinr efor/n vifm mnrtei 

The experimental procedure was according to the above outline. pH was measured at time 

1, 2.5, and 5.5 hours. Incubations were terminated after 6 hours and samples of 30 ml were 
removed and placed on ice before centrifugation (10000 x g. 10 min. 4»C). Supematants were 
removed and stored at -20°C. 



30 Analysis 
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All samples were analysed for % degree of protein with the opa ^ 
ofsolubHisedanddigestedproteinusinggelfiltration. ° ^ the ° PA method - wel. as content 

DH determination ft tn B Q PA . mftf hftH 

«=u imcromer piate based colonmetric method (Nielsen pm- r» 

deoahyle ™ ™ ^ 7T P " P ~ d aS 7 620 « «• ^borate 

97% «>PA> JZT ^ ° l,ed b8fore COn,lnUln 9- 160 "9 0*hthaM*aldehyda 

—«..«„ Jitsz^^^rr^ 99,4 ^ 4,39 

Plate The micro titer . ■ st a™fc* and blind was dispensed into a microliter 

Estimation of solubiii^ri ™h w iaested p mtoln 

CHrlhTT ^ (CP) 981 •*» HPLC - Suparnatams wj thaw ad 

Bterad through 0.45 M m polycarbonate fitters and diluted (1-50 vM with H O d«,*,h .7 
were chromatoaraphed bv HPLC . c ^ „ ' D ** d 

ooluntn (Gtobau Thartul ™ h , * P " - " P8PMe PE (75 x 300 mm > <*">*»> 
7 0) oonwZ ££V? ,S0Cra,l<! e,U,i0n *" 50 mM "*■» ^ (PH 

datamtinad by Irtteg^T^T^ * ™ »* «*" ™ — » «- I*— »aa 

w»h Kn own C Z^nt^ ta" o^ "I T * .*° ™ BM ™*» 

out ua.n 9 tha J^l^t T in <* was oarried 

Kaldah! m athod (datemanatfon of « rt^ogan: A.OA.C. (1984) omofe. Mathoda of 
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Analysis 1 4th ed., Washington DC). 

The content of digested protein was estimated by integrating the chromatogram area 
corresponding to peptides and amino acids having a molecular mass of 1500 Dalton or below 
(Savoie,L; Gauthier.S.F. Dialysis Cell For The In-vitro Measurement Of Protein Digestibility. J. 
Food Sci. 1986. 51, 494-498; Babinszky.L; Van.D.M.J.M.; Boer.H.; Den.H.LA. An In-vitro Method 
for Prediction of The Digestible Crude Protein Content in Pig Feeds. J. Sci. Food Agr. 1990. 50, 
173-178; Boisen.S.; Eggum,B.O. Critical Evaluation of In-vitro Methods for Estimating Digestibility 
in Simple-Stomach Animals. Nutrition Research Reviews 1991, 4, 141-162). To determine the 
1500 Dalton dividing line, the gel filtration column was calibrated using cytochrome C (Boehringer, 
Germany), aprotinin, gastrin I, and substance P (Sigma Aldrich, USA), as molecular mass 
standards. 



20 



The results shown In Tables 13 and 14 below indicate that the protease increased the 
15 Degree of Hydrolysis (DH), as well as soluble and digestible protein significantly. 

IabjeJ3: Degree of Hydrolysis mm. absolute and relative values 



Enzyme 

(dosage in mg EP/kg 
feed) 



Blank 



Protease of the invention 
(100) 



Of total protein 



%DH 



26.84 



28.21 



SD 



0.69 



0.35 



Relative to blank 



%DH 



100.0 



105.1 



%CV 



2.57 



1.25 



Different letters within the same column indicate significant differences (1-way ANOVA, Tukey- 
Kramer test, P<0.05). SD = Standard Deviation. %CV = Coefficient of Variance * (SD/mean value) 
x100% 



Table 14: Solubiiised and digested crude protein measured bv AKTA HPLC. 



» 



Enzyme 
(dosage in 
mg EP/kg 
feed) 


n 


Of total r 


>rotein 


aui eu uy /Arv i f\ nrLo. 

Relative to blank 


%dig. 
CP 


SD 


%sol.CP 


SD 


%dig.CP 


CV% 


%sol.CP 


CV% 


Blank 


5 


54.1 


a i 


1.1 


90.1 


a 


1.1 


100.0 


a 


2.0 


100.0 


a 


1.2 


Protease of 


5 


S/.7 


b 


1.1 


^93.2 


"B 


1.4 


106.7 


b 


1.9 


103.4 


b 


1.5 
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the invention 
(50) 




























(100) 

different lottorc 


5 


58.9 


~5 — 


0.8 


94.8 


b 


0.9 


108.9 


b ' 


1.3 


105.2 


b 


0.9 



wiiwwiwbi \i~way /MWVM, I UKey- 

Kramer test, P<0.05). SD * Standard Deviation. %CV = Coefficient of Variance = (SD/mean value) 
x 100% 



5 Example 12 

Performance of the protease from Nocardiopsis dassonvillei subsp. dassonvillei DSM 
43235 in an aquaculture in vitro model. The protease preparation as described in Example 3 was 
tested in an aquaculture in vitro model simulating the digestion in coldwater fish. The in vitro 
system consisted of 15 flasks in which SBM substrate was initially incubated with HCI/pepsin - 
simulating gastric digestion - and subsequently with pancreatin - simulating intestinal digestion. 
10 of the flasks were dosed with the protease at the start of the gastric phase whereas the 
remaining 5 flasks served as blanks. At the end of the intestinal incubation phase samples of in 
vitro digesta were removed and analysed for solubilised and digested protein. 

15 Outline of a qua in vitm digestion nrocftrinna 
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Components added 



10 g extruded SBM substrate, 
62 mL HCI (0.155M)/pepsin 
(4000 U/g substrate), 1 mL of 
the protease of the invention 
7 mL NaOH (1.1M) 



5 mL NaHC0 3 (1M) / pancreatin 

(8mg/gdiet) 

Terminate incubation 



P H 



3.0 



6.8 



6.8 



7.0 



Temperature 



15°C 



15°C 



15°C 



15°C 



Time 
course 



t=0min 



t=6 hours 



t=7 hours 



t=24 
hours 



Simulated digestion 
phase 



Gastric digestion 



Intestinal digestion 



Intestinal digestion 



Conditions 
Substrate: 
pH: 

20 HCI: 
Pepsin: 



10 g extruded SBM 

3.0 stomach step/ 6.8-7.0 intestinal 

0.155 M for 6 hours 

4000 U/g diet for 6 hours 
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Pancreatin: 8 mg/g diet for 1 7 hours 
Temperature: 15°C 
Replicates: 5 

Solutions 
1.1 MNaOH 

0.155 M HCI / pepsin (4000 U/g diet) 

1 M NaHC0 3 containing 16 mg pancreatin/mL 

125 mM NaAc-buffer, pH 6.0 



Experimental procedure far aqua in vitro mortal 

The experimental produce was according to the above outline. pH was measured at time 
1 . 5, 8 and 23 hours. Incubations were terminated after 24 houre and samples of 30 mL were 
removed and placed on ice before centrifugation (13000 x g, 10 min, OX). Supernatants were 
1 5 removed and stored at -20°C. 

Analysis 

All supernatants were analysed using the OPA method (% degree of hydrolysis) and by AKTA 
HPLC to determine solubilised and digested protein (see monogastric example). 

Pre-treatment of in vitm supernatants with EASY SPE columns 

Before analysis on AKTA HPLC supernatants from the in vitm system were pretreated 
using solid-phase sample purification. This was done to improve the chromatography and thereby 
prevent unstable elution profiles and baselines. The columns used for extraction were solid phase 
extraction columns (Chromabond EASY SPE Columns from Macherey-Nagel). 2 mL milliQ water 
was eluted through the columns by use of a vacuum chamber (vacuum 0.15 x 100 kPa). 
Subsequently 3 mL in vitro sample was dispensed onto the column and eluted (vacuum 0.1 x 100 
kPa), the first y 2 mL of eluted sample was thrown away and a clean tube was placed beneath the 
column, then the rest of the sample was eluted and saved for further dilution. 

Results 

The results shown in Tables 15 and 16 below indicate that the protease significantly 
increased Degree of hydrolysis and protein digestibility. 

Table 15: De g ree of Hydrolysis (PH) measured bv the OP A method, absol ute and relative values 



35 
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Enzyme 

(mg EP/kgdiet) 


n 


Of total protein 


Relative to blank 


%DH 




SD 


%DH 




%cv 


Blank 


5 


21.30 


a 


0.52 


100.0 


a 


2.42 


Protease of the invention (50) 


5 


21.98 


-b 


0.22 


103.2 




1.00 



;i-way ANOVA, Tukey- 

Kramer test, P<0.05). SD = Standard Deviation. %CV = Coefficient of Variance - (SD/mean value) 
x 100% 



5 Table 16: S olubilised and digested crude protein measured by AKTA HPLC, absolute and relative 
values 



Enzyme 
(mg EP/kg 
diet) 


N 




Of total protein 


Relative to blank 


%CP 
dig 




SO 


%CP 
sol 




SD 


%CP 
dig 




%cv 


%CP 
sol 




%CV 


Blank 


5 


50.0 


a 


2.2 


89.9 


& 


3.2 


100.0 


■ a— I 


4.5 


100.0 


a 


3.5 


Protease of 
the 

invention 
(50) 


5 


52.3 


—tr" 


1.1 


91.4 


a 


1.5 


104.8 


- h 


2.1 


101.7 


-8— | 


1.6 


(100) 


5 


53.4 


0 


0.4 


91.6 


a 


1.0 


107.0 


0 


0.7 


101.9 


— a— 


1.1 



Kramer test, PO.05). SD = Standard Deviation. %CV = Coefficient of Variance = (SD/mean value) 
x 100%. 
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CLAIMS 

1. A secreted fusion polypeptide which has alpha-lytic endopeptidase activity, which polypeptide 
comprises a heterologous pro-region, and which polypeptide: 

(a) comprises an amino acid sequence which is at least 70% identical to the amino acid 
5 sequence of the mature part of the polypeptide shown in SEQ ID NO: 28; SEQ ID NO: 

33; SEQ ID NO; 47; or SEQ ID NO: 41 

(b) comprises an amino acid sequence which is at least 70% identical to the amino acid 
sequence of the mature part of the polypeptide encoded by the polynucleotide in SEQ 
ID NO: 1; SEQ ID NO: 2; SEQ ID NO: 25; SEQ ID NO: 31; SEQ ID NO: 32; SEQ ID 

1 0 NO: 36; or SEQ ID NO: 40; 

(c) comprises a mature part which is a variant of the mature part of the polypeptide having 
the amino acid sequence of SEQ ID NO: 28; SEQ ID NO: 33; SEQ ID NO: 37; or SEQ 
ID NO: 41, the segment comprising a substitution, deletion, extension, and/or insertion 
of one or more amino acids; 

15 (d) is an allelic variant of (a), (b), or (c); 

(e) is a fragment of (a), (b), (c), or (d). 

2. The polypeptide according to daim 1. wherein the heterologous pro-region is derived from a 
protease, preferably the pro-region is derived from an S2A or S1E protease, and most preferably it 

20 is at least 70% identical to the pro-region shown in SEQ ID NO: 30. 

3. The polypeptide according to claim 1 or 2. which mature part is an artificial variant of a wildtype 
polypeptide said variant having one or more amino-acid(s) added to the C-terminus as compared 
to the wildtype. 

25 

4. The polypeptide according to any of claims 1 - 3, which comprises at least three non-polar or 
uncharged polar amino acids within the last four amino acids of the C-termlnus of the polypeptide 

5. The polypeptide according to claim 3, wherein the one or more added amino acid(s) is (are) 
30 non-polar or uncharged. 

6. The polypeptide according to claim 5, wherein the one or more added amino acid(s) is one or 
moreofQ,S, V.A.orP. 
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7. The polypeptide according to claim 4, wherein the one or more added amino acids are selected 
from the group consisting of: QSHVQSAP, QSAP, QP, TL, TT, QL. TP, LP, Tl, IQ. QP, PI, LT, TQ, 
IT, QQ, and PQ. 

8. The polypeptide according to any of claims 1 - 7 which comprises a heterologous secretion 
signal-peptide which is cleaved from the polypeptide when the polypeptide is secreted, preferably 
the heterologous secretion signal peptide is derived from a heterologous protease. 

9. The polypeptide according to claim 8, wherein the heterologous secretion signal peptide 
comprises an amino acid sequence having a sequence identity of at least 70% with the amino 
acid sequence encoded by polynucleotides 1 - 81 of SEQ ID NO: 2. 

10. An isolated polynucleotide encoding a polypeptide as defined in any of claims 1-9. 

11. A recombinant expression vector or polynucleotide construct comprising a polynucleotide as 
defined in claim 10. 

12. A recombinant host cell comprising a polynucleotide as defined in claim 10, or an expression 
vector or polynucleotide construct as defined in daim 1 1 . 

13. The recombinant host cell according to claim 12 which is a Bacillus cell. 

14. A transgenic plant, or plant part, comprising a polynucleotide as defined in claim 10, or an 
expression vector or polynucleotide construct as defined in claim 1 1 . 

15. A transgenic, non-human animal, or products, or elements thereof, comprising a 
polynucleotide as defined in claim 10, or an expression vector or polynucleotide construct as 
defined in claim 11. 

16. A method for producing a polypeptide as defined in any of claims 1 - 9, the method 
comprising: (a) cultivating a recombinant host cell as defined in claim 12 or 13, or a transgenic 
plant or animal as defined in claims 14 or 15, to produce a supernatant comprising the 
polypeptide, and optionally (b) recovering the polypeptide. 
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17. An animal feed additive comprising at least one polypeptide as defined in any of claims 1 - 9; 
and 

(a) at least one fat-soluble vitamin, and/or 

(b) at least one water-soluble vitamin, and/or 

(c) at least one trace mineral. 

18. An animal feed composition having a crude protein content of 50 to 800 g/kg and comprising 
at least one polypeptide as defined in any of claims 1 - 9. or at least one feed additive of claim 17. 

19. A composition comprising at least one polypeptide as defined in any of claims 1 - 9. together 
with at least one other enzyme selected from amongst phytase (EC 3.1.3.8 or 3.1.3.26); xylanase 
(EC 3.2.1.8); galactanase (EC 3.2.1.89); alpha-galactosidase (EC 3.2.1.22); protease (EC 3.4.-.-), 
phospholipase A1 (EC 3.1.1.32); phospholipase A2 (EC 3.1.1.4); lysophospholipase (EC 3.1.1.5); 
phospholipase C (3.1.4.3); phospholipase D (EC 3.1.4.4); and/or beta-glucanase (EC 3.2.1.4 or 
EC 3.2.1.6). 



20. A method for using at least one polypeptide as defined in any of claims 1 - 9. for improving the 
nutritional value of an animal feed, for increasing digestible and/or soluble protein in animal diets, 
for increasing the degree of hydrolysis of proteins in animal diets, and/or for the treatment of 
vegetable proteins, the method comprising including the polypeptide^) in animal feed, and/or in a 
composition for use in animal feed. 



21. A method for using at least one polypeptide as defined in any of claims 1 - 9. comprising 
including the polypeptide(s) in a detergent formulation. 
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ABSTRACT 

A secreted fusion polypeptide which has alpha-lytic endopeptidase activity, which 
polypeptide comprises a heterologous pro-region, and which polypeptide is derived from the 
mature part of an S2A or S1 E protease. 
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SEQUENCE LISTING 1 $ JUUl 2l 

<110> Novozymes A/s 

<120> improved proteases and methods for producing them 
<130> 10495. 000-DK 
<160> 42 

<170> Patent in version 3.2 

<210> 1 
<211> 1062 
<212> DNA 

<213> Nocardiopsis sp. NRRL 18262 
<400> 1 

gctactggag cattacctca gtctcctaca cctgaagcag atgcagtatc gatgcaagaa 60 

gcattacaac gtgatcttga tcttacatca gctgaagctg aggaattact tgctgcacaa 120 

gatacagcct ttgaagttga tgaagctgcc gctgaagcag ctggtgatgc atatggtggt 180 

tcagtattcg atactgaatc actcgaactt actgtactag tgaccgatgc agcagctgtt 240 

gaagctgttg aagccacagg tgcaggtaca gagctcgtat cttatggtat tgatggatta 300 

gatgagatcg tacaagagct taatgcagct gatgccgttc caggtgtagt tggatggtat 360 

cctgatgtag caggtgatac tgttgtctta gaagttcttg aaggctctgg agctgatgtt 420 

tctggacttt tagcagacgc aggagtcgat gcatccgcgg ttgaagtgac cacgtcagat 480 

cagcctgaac tctatgccga tatcattgga ggcctagcgt acacaatggg tggtcgctgc 540 

agcgtaggat ttgcagccac aaatgcagct ggacaacctg gcttcgtgac agctggacat 600 

tgcggccgcg tcggtacaca ggttactatc ggcaatggaa gaggtgtctt tgagcaaagc 660 

gtatttcccg ggaatgatgc tgccttcgtt agaggtacgt ccaactttac gcttactaac 720 

ttagtatcta gatacaacac tggcggatat gcaactgtag caggtcacaa tcaagcacct 780 

attggctcta gcgtctgccg ctcagggtcg actacaggat ggcattgtgg aaccattcaa 840 

gctagaggtc agagcgtgag ctatcctgaa ggtaccgtaa cgaacatgac tcgtacgact 900 

gtatgtgcag aaccaggtga ctctggaggt tcatatatca gcggtacgca agcgcaaggc 960 

gttacctcag gtggatccgg taactgtagg acaggtggca caacgttcta ccaggaagtg 1020 

acaccgatgg tgaactcttg gggagttaga ctccgtacat aa 1062 

<210> 2 
<211> 1143 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> "lOR Rt fcci5 h2 R o?f n ? CJ 0R synt-15) encoding a S2A protease denoted 
e i«^„ fuse * bv ,_ PCR ""P ^ame to the signal peptide encoding 
sequence of a heterologous protease, savinase. """"""9 

<400> 2 

atgaagaaac cgttggggaa aattgtcgca agcaccgcac tactcatttc tgttgctttt 60 

Page 1 
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agttcatcga tcgcatcggc tgctactgga gcattacctc agtctcctac acctgaagca 120 

gatgcagtat cgatgcaaga agcattacaa cgtgatcttg atcttacatc agctgaagct 180 

gaggaattac ttgctgcaca agatacagcc tttgaagttg atgaagctgc cgctgaagca 240 

gctggtgatg catatggtgg ttcagtattc gatactgaat cactcgaact tactgtacta 300 

gtgaccgatg cagcagctgt tgaagctgtt gaagccacag gtgcaggtac agagctcgta 360 

tcttatggta ttgatggatt agatgagatc gtacaagagc ttaatgcagc tgatgccgtt 420 

ccaggtgtag ttggatggta tcctgatgta gcaggtgata ctgttgtctt agaagttctt 480 

gaaggctctg gagctgatgt ttctggactt ttagcagacg caggagtcga tgcatccgcg 540 

gttgaagtga ccacgtcaga tcagcctgaa ctctatgccg atatcattgg aggcctagcg 600 

tacacaatgg gtggtcgctg cagcgtagga tttgcagcca caaatgcagc tggacaacct 660 

ggcttcgtga cagctggaca ttgcggccgc gtcggtacac aggttactat cggcaatgga 720 

agaggtgtct ttgagcaaag cgtatttccc gggaatgatg ctgccttcgt tagaggtacg 780 

tccaacttta cgcttactaa cttagtatct agatacaaca ctggcggata tgcaactgta 840 

gcaggtcaca atcaagcacc tattggctct agcgtctgcc gctcagggtc gactacagga 900 

tggcattgtg gaaccattca agctagaggt cagagcgtga gctatcctga aggtaccgta 960 

acgaacatga ctcgtacgac tgtatgtgca gaaccaggtg actctggagg ttcatatatc 1020 

agcggtacgc aagcgcaagg cgttacctca ggtggatccg gtaactgtag gacaggtggc 1080 

acaacgttct accaggaagt gacaccgatg gtgaactctt ggggagttag actccgtaca 1140 

taa 1143 

<210> 3 

<211> 8 

<212> prt 

<213> Artificial sequence 
<220> 

<223> c- terminal amino acid tail expressed as fusion to protease of the 
invention. 

<400> 3 

Gin ser His Val Gin ser Ala Pro 
1 5 

<210> 4 

<211> 24 

<212> dna 

<213> Artificial sequence 
<220> 

<223> Polynucleotide encoding a C-terminal amino acid tail expressed as 
fusion to protease of the invention. 

<400> 4 

caatcgcatg ttcaatccgc tcca 24 
<210> 5 
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<211> 4 

<212> PRT 

<213> Artificial sequence 
<220> 

<223> invention 1 anrino acid tai1 ex P ressed as fusion to protease of the 

<400> 5 

Gin ser Ala Pro 

<210> 6 

<211> 12 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Polynucleotide encoding a c-teroiinal amino acid tail expressed as 
fusion to protease of the invention. 

<400> 6 

caatcggctc ct 12 

<210> 7 

<211> 2 

<212> PRT 

<213> Artificial sequence 
<220> 

<223> c-terminal amino acid tail expressed as fusion to protease of the 
invention. 

<400> 7 

Gin Pro 
1 

<210> 8 
<211> 6 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Polynucleotide encoding a C-terminal amino acid tail expressed as 
fusion to protease of the invention. 

<400> 8 

caacca 6 

<210> 9 

<211> 1 

<212> PRT 

<213> Artificial sequence 
<220> 

<223> invention 1 am1n ° 3Cld tai1 e *P ressed as fusion to protease of the 

<400> 9 



Pro 
1 
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<210> 10 

<211> 3 

<212> ONA 

<213> Artificial sequence 
<220> 

<223> Polynucleotide encoding a C-terminal amino acid tail expressed as 
fusion to protease of the invention. expressea as 

<400> 10 

cca , 



<210> 11 

<211> 45 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer #252639 

<400> 11 

catgtgcatg tgggtaccgc aacgttcgca gatgctgctg aagag 

<210> 12 

<211> 44 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer #251992 

<400> 12 

catgtgcatg tggtcgaccg attatggagc ggattgaaca tgcg 



45 



44 



<210> 
<211> 
<212> 
<213> 

<220> 
<223> 



13 
44 
DNA 

Artificial sequence 
Primer #179541 



<400> 13 

gcgttgagac gcgcggccgc gagcgccgtt tggctgaatg atac 

<210> 14 

<211> 43 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer #179542 

<400> 14 

gcgttgagac agctcgagca gggaaaaatg gaaccgcttt ttc 

<210> 15 

<211> 64 

<212> DNA 

<213> Artificial sequence 



44 



43 
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<223> Primer #179539 
<400> 15 

ccatttgatc agaattcact ggccgtcgtt ttacaaccat tgcggaaaat agtcataggc 
atcc 



<210> 16 

<211> 60 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer #179540 

<400> 16 

ggatccagat ctggtacccg ggtctagagt cgacgcggcg gttcgcgtcc ggacagcaca 

<210> 17 

<2ll> 37 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> primer #179154 

<400> 17 

gttgtaaaac gacggccagt gaattctgat caaatgg 

<210> 18 

<211> 37 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer #179153 

<400> 18 

ccgcgtcgac actagacacg ggtacctgat ctagatc 

<210> 19 

<211> 22 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> primer #317 

<400> 19 

tggcgcaatc ggtaccatgg gg 

<210> 20 
<211> 40 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> primer #139 Notl 
<400> 20 

catgtgcatg cggccgcatt aacgcgttgc cgcttctgcg 
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<210> 21 
<211> 7443 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Sequence of plasmid pMB1508 
<400> 21 

tcgcgcgttt cggtgatgac ggtgaaaacc tctgacacat gcagctcccg gagacggtca 60 

cagcttgtct gtaagcggat gccgggagca gacaagcccg tcagggcgcg tcagcgggtg 120 

ttggcgggtg tcggggctgg cttaactatg cggcatcaga gcagattgta ctgagagtgc 180 

accatatgcg gtgtgaaata ccgcacagat gcgtaaggag aaaataccgc atcaggcgcc 240 

attcgccatt caggctgcgc aactgttggg aagggcgatc ggtgcgggcc tcttcgctat 300 

tacgccagct ggcgaaaggg ggatgtgctg caaggcgatt aagttgggta acgccagggt 360 

tttcccagtc acgacgttgt aaaacgacgg ccagtgaatt cgataaaagt gctttttttg 420 

ttgcaattga agaattatta atgttaagct taattaaaga taatatcttt gaattgtaac 480 

gcccctcaaa agtaagaact acaaaaaaag aatacgttat atagaaatat gtttgaacct 540 

tcttcagatt acaaatatat tcggacggac tctacctcaa atgcttatct aactatagaa 600 

tgacatacaa gcacaacctt gaaaatttga aaatataact accaatgaac ttgttcatgt 660 

gaattatcgc tgtatttaat tttctcaatt caatatataa tatgccaata cattgttaca 720 

agtagaaatt aagacaccct tgatagcctt actataccta acatgatgta gtattaaatg 780 

aatatgtaaa tatatttatg ataagaagcg acttatttat aatcattaca tatttttcta 840 

ttggaatgat taagattcca atagaatagt gtataaatta tttatcttga aaggagggat 900 

gcctaaaaac gaagaacatt aaaaacatat atttgcaccg tctaatggat ttatgaaaaa 960 

tcattttatc agtttgaaaa ttatgtatta tggagctctg aaaaaaagga gaggataaag 1020 

aatgaagaaa ccgttgggga aaattgtcgc aagcaccgca ctactcattt ctgttgcttt 1080 

tagttcatcg atcgcatcgg ctgctgaaga agcaaaagaa aaatatttaa ttggctttaa 1140 

tgagcaggaa gctgtcagtg agtttgtaga acaagtagag gcaaatgacg aggtcgccat 1200 

tctctctgag gaagaggaag tcgaaattga attgcttcat gaatttgaaa cgattxctgt 1260 

tttatccgtt gagttaagcc cagaagatgt ggacgcgctt gaactcgatc cagcgatttc 1320 

ttatattgaa gaggatgcag aagtaacgac aatggcgcaa tcggtaccat ggggtatatc 1380 

aacgcgttaa tccgcggata tatagcggcc gcagatctgg gaccaataat aatgactaga 1440 

gaagaaagaa tgaagattgt tcatgaaatt aaggaacgaa tattggataa agtgggatat 1500 

ttttaaaata tatatttatg ttacagtaat attgactttt aaaaaaggat tgattctaat 1560 

gaagaaagca gacaagtaag cctcctaaat tcactttaga taaaaattta ggaggcatat 1620 

caaatgaact ttaataaaat tgatttagac aattggaaga gaaaagagat atttaatcat 1680 

tatttgaacc aacaaacgac ttttagtata accacagaaa ttgatattag tgttttatac 1740 

Page 6 



~» 9< .~.~ 10495.000-DK.ST25.txt 

cgaaacataa aacaagaagg atataaattt taccctgcat ttattttctt agtgacaagg 1800 

gtgataaact caaatacagc ttttagaact ggttacaata gcgacggaga gttaggttat 1860 

tgggataagt tagagccact ttatacaatt tttgatggtg tatctaaaac attctctggt 1920 

atttggactc ctgtaaagaa tgacttcaaa gagttttatg atttatacct ttctgatgta 1980 

gagaaatata atggttcggg gaaattgttt cccaaaacac ctatacctga aaatgctttt 2040 

tctctttcta ttattccatg gacttcattt actgggttta acttaaatat caataataat 2100 

agtaattacc ttctacccat tattacagca ggaaaattca ttaataaagg taattcaata 2160 

tatttaccgc tatctttaca ggtacatcat tctgtttgtg atggttatca tgcaggattg 2220 

tttatgaact ctattcagga attgtcagat aggcctaatg actggctttt ataatatgag 2280 

ataatgccga ctgtactttt tacagtcggt tttctaacga tacattaata ggtacgaaaa 2340 

agcaactttt tttgcgctta aaaccagtca taccaataac ttaagggtaa ctagccrcgc 2400 

cggaaagagc gaaaatgcct cacatttgtg ccacctaaaa aggagcgatt tacatatgag 2460 

ttatgcagtt tgtagaatgc aaaaagtgaa atcagctgga ctaaaagggg ccgcagagta 2520 

gaatggaaaa ggggatcgga aaacaagtat ataggaggag acctatttat ggcttcagaa 2580 

aaagacgcag gaaaacagtc agcagtaaag cttgttccat tgcttattac tgtcgctgtg 2640 

ggactaatca tctggtttat tcccgctccg tccggacttg aacctaaagc ttggcatttg 2700 

tttgcgattt ttgtcgcaac aattatcggc tttatctcca agcccttgcc aatgggtgca 2760 

attgcaattt ttgcattggc ggttactgca ctaactggaa cactatcaat tgaggataca 2820 

ttaagcggat tcgggaataa gaccatttgg cttatcgtta tcgcattctt tatttcccgg 2880 

ggatttatca aaaccggtct cggtgcgaga atttcgtatg tattcgttca gaaattcgga 2940 

aaaaaaaccc ttggactttc ttattcactg ctattcagtg atttaatact ttcacctgct 3000 

attccaagta atacggcgcg tgcaggaggc attatatttc ctattatcag atcattatcc 3060 

gaaacattcg gatcaagccc ggcaaatgga acagagagaa aaatcggtgc attcttatta 3120 

aaaaccggtt ttcaggggaa tctgatcaca tctgctatgt tcctgacagc gatggcggcg 3180 

aacccgctga ttgccaagct ggcccatgat gtcgcagggg tggacttaac atggacaagc 3240 

tgggcaattg ccgcgattgt accgggactt gtaagcttaa tcatcacgcc gcttgtgatt 3300 

tacaaactgt atccgccgga aatcaaagaa acaccggatg cggcgaaaat cgcaacagaa 3360 

aaactgaaag aaatgggacc gttcaaaaaa tcggagcttt ccatggttat cgtgtttctt 3420 

ttggtgcttg tgctgtggat ttttggcggc agcttcaaca tcgacgctac cacaaccgca 3480 

ttgatcggtt tggccgttct cttattatca caagttctga cttgggatga tatcaagaaa 3540 

gaacagggcg cttgggatac gctcacttgg tttgcggcgc ttgtcatgct cgccaacttc 3600 

ttgaatgaat taggcatggt gtcttggttc agtaatgcca tgaaatcatc cgtatcaggg 3660 

ttctcttgga ttgtggcatt catcatttta attgttgtgt attattactc tcactatttc 3720 

tttgcaagtg cgacagccca catcagtgcg atgtattcag catttttggc tgtcgtcgtg 3780 
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gcagcgggcg caccgccgct tttagcagcg ctgagcctcg cgttcatcag caacctgttc 3840 

gggtcaacga ctcactacgg ttctggagcg gctccggtct tcttcggagc aggctacatc 3900 

ccgcaaggca aatggtggtc catcggattt atcctgtcga ttgttcatat catcgtatgg 3960 

cttgtgatcg gcggattatg gtggaaagta ctaggaatat ggtagaaaga aaaaggcaga 4020 

cgcggtctgc ctttttttat tttcactcct tcgtaagaaa atggattttg aaaaatgaga 4080 

aaattccctg tgaaaaatgg tatgatctag gtagaaagga cggctggtgc tgtggtgaaa 4140 

aagcggttcc atttttccct gcaaacaaaa ataatggggc tgattgcggc tctgctggtc 4200 

tttgtcattg gtgtgctgac cattacgtta gccgttcagc atacacaggg agaacggaga 4260 

caggcagagc agctggcggt tcaaacggcg agaaccattt cctatatgcc gccggttaaa 4320 

gagctcattg agagaaaaga cggacatgcg gctcagacgc aagaggtcat tgaacaaatg 4380 

aaagaacaga ctggtgcgtt tgccatttat gttttgaacg aaaaaggaga cattcgcagc 4440 

gcctctggaa aaagcggatt aaagaaactg gagcgcagca gagaaatttt gtttggcggt 4500 

tcgcatgttt ctgaaacaaa agcggatgga cgaagagtga tcagagggag cgcgccgatt 4560 

ataaaagaac agaagggata cagccaagtg atcggcagcg tgtctgttga ttttctgcaa 4620 

acggagacag agcaaagcat caaaaagcat ttgagaaatt tgagtgtgat tgctgtgctt 4680 

gtactgctgc tcggatttat tggcgccgcc gtgctggcga aaagcatcag aaaggatacg 4740 

ctcgggcttg aaccgcatga gatcgcggct ctatatcgtg agaggaacgc aatgcttttc 4800 

gcgattcgag aagggattat tgccaccaat cgtgaaggcg tcgtcaccat gatgaacgta 4860 

tcggcggccg agatgctgaa gctgcccgag cctgtgatcc atcttcctat agatgacgtc 4920 

atgccgggag cagggctgat gtctgtgctt gaaaaaggag aaatgctgcc gaaccaggaa 4980 

gtaagcgtca acgatcaagt gtttattatc aatacgaaag tgatgaatca aggcgggcag 5040 

gcgtatggga ttgtcgtcag cttcagggag aaaacagagc tgaagaagct gatcgacaca 5100 

ttgacagagg ttcgcaaata ttcagaggat ctcagggcgc agactcatga attttcaaat 5160 

aagctttatg cgattttagg gctgcgtcga cctgcaggca tgcaagcttg gcgtaatcat 5220 

ggtcatagct gtttcctgtg tgaaattgtt atccgctcac aattccacac aacatacgag 5280 

ccggaagcat aaagtgtaaa gcctggggtg cctaatgagt gagctaactc acattaattg 5340 

cgttgcgctc actgcccgct ttccagtcgg gaaacctgtc gtgccagctg cattaatgaa 5400 

tcggccaacg cgcggggaga ggcggtttgc gtattgggcg ctcttccgct tcctcgctca 5460 

ctgactcgct gcgctcggtc gttcggctgc ggcgagcggt atcagctcac tcaaaggcgg 5520 

taatacggtt atccacagaa tcaggggata acgcaggaaa gaacatgtga gcaaaaggcc 5580 

agcaaaaggc caggaaccgt aaaaaggccg cgttgctggc gtttttccat aggctccgcc 5640 

cccctgacga gcatcacaaa aatcgacgct caagtcagag gtggcgaaac ccgacaggac 5700 

tataaagata ccaggcgttt ccccctggaa gctccctcgt gcgctctcct gttccgaccc 5760 

tgccgcttac cggatacctg tccgcctttc tcccttcggg aagcgtggcg ctttctcata 5820 
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gctcacgctg taggtatctc agttcggtgt aggtcgttcg ctccaagctg ggctgtgtgc 5880 

acgaaccccc cgttcagccc gaccgctgcg ccttatccgg taactatcgt cttgagtcca 5940 

acccggtaag acacgactta tcgccactgg cagcagccac tggtaacagg attagcagag 6000 

cgaggtatgt aggcggtgct acagagttct tgaagtggtg gcctaactac ggctacacta 6060 

gaaggacagt atttggtatc tgcgctctgc tgaagccagt taccttcgga aaaagagttg 6120 

gtagctcttg atccggcaaa caaaccaccg ctggtagcgg tggttttttt gtttgcaagc 6180 

agcagattac gcgcagaaaa aaaggatctc aagaagatcc tttgatcttt tctacggggt 6240 

ctgacgctca gtggaacgaa aactcacgtt aagggatttt ggtcatgaga ttatcaaaaa 6300 

ggatcttcac ctagatcctt ttaaattaaa aatgaagttt taaatcaatc taaagtatat 6360 

atgagtaaac ttggtctgac agttaccaat gcttaatcag tgaggcacct atctcagcga 6420 

tctgtctatt tcgttcatcc atagttgcct gactccccgt cgtgtagata actacgatac 6480 

gggagggctt accatctggc cccagtgctg caatgatacc gcgagaccca cgctcaccgg 6540 

caccggattt atcagcaata aaccagccag ccggaagggc cgagcgcaga agtggtcctg 6600 

caactttatc cgcctccatc cagtctatta attgttgccg ggaagctaga gtaagtagtt 6660 

cgccagttaa tagtttgcgc aacgttgttg ccattgctac aggcatcgtg gtgtcacgct 6720 

cgtcgtttgg tatggcttca ttcagctccg gttcccaacg atcaaggcga gttacatgat 6780 

cccccatgtt gtgcaaaaaa gcggttagct ccttcggtcc tccgatcgtt gtcagaagta 6840 

agttggccgc agtgttatca ctcatggtta tggcagcact gcataattct cttactgtca 6900 

tgccatccgt aagatgcttt tctgtgactg gtgagtactc aaccaagtca ttctgagaat 6960 

agtgtatgcg gcgaccgagt tgctcttgcc cggcgtcaat acgggataat accgcgccac 7020 

atagcagaac tttaaaagtg ctcatcattg gaaaacgttc ttcggggcga aaactctcaa 7080 

ggatcttacc gctgttgaga tccagttcga tgtaacccac tcgtgcaccc aactgatctt 7140 

cagcatcttt tactttcacc agcgtttctg ggtgagcaaa aacaggaagg caaaatgccg 7200 

caaaaaaggg aataagggcg acacggaaat gttgaatact catactcttc ctttttcaat 7260 

attattgaag catttatcag ggttattgtc tcatgagcgg atacatattt gaatgtattt 7320 

agaaaaataa acaaataggg gttccgcgca catttccccg aaaagtgcca cctgacgtct 7380 

aagaaaccat tattatcatg acattaacct ataaaaatag gcgtatcacg aggccctttc 7440 

9tC 7443 

<210> 22 
<211> 5718 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> sequence of MB1510 genomic integration region 
<400> 22 

gagcgccgtt tggctgaatg atacaacagt ctcacttcct tactgcgtct ggttgcaaaa 60 
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acgaagaagc aaggattccc etcgcttctx aVttgtcct a S tttattatac acttttttaa 120 

gcacatcttt ggcgcttgtt tcactagact tgatgcctct gaatcttgtc caagtgtcac 180 

ggtccgcatc atagacttgt ccatttttca ccgctttgag atttttccag agcgggttcg 240 

ttttccactc atctacaatg gttttgcctt cgttggctga gatgaacaaa atatcaggat 300 

cgattttgct caattgctca aggctgacct cttgataggc gttatctgac ttcacagcgt 360 

gtgtaaagcc tagcatttta aagatttctc cgtcatagga tgatgatgta tgaagctgga 420 

aggaatccgc tcttgcaacg ccgagaacga tgttgcggtt ttcatctttc ggaagttcgg 480 

cttttagatc gttgatgact tttttgtgct cggcaagctt ttcttttcct tcatcttctt 540 

tatttaatgc tttagcaatg gtcgtaaagc tgtcgatcgt ttcgtcatat gtcgcttcac 600 

ggctttttaa ttcaatcgtc ggggcgattt ttttcagctg tttataaatg tttttatggc 660 

gctcagcgtc agcgatgatt aaatcaggct tcaaggaact gatgacctca agattgggtt 720 

cgctgcgtgt gcctacagat gtgtaatcaa tggagctgcc gacaagcttt ttaatcatat 780 

cttttttgtt gtcatctgcg atgcccaccg gcgtaatgcc gagattgtga acggcatcca 840 

agaatgaaag ctcaagcaca accacccgct taggtgtgcc gcttactgtc gtttttcctt 900 

cttcgtcatg gatcactctg gaatccttag actcgctttt gccgcttccg ttgttattct 960 

ggcttgatga acagccggat acaatgaggc aggcgagcaa taaaacactc atgatggcaa 1020 

tcaacttgtt agaataggtg cgcatgtcat tcttcctttt ttcagattta gtaatgagaa 1080 

tcattatcac atgtaacact ataatagcat ggcttatcat gtcaatattt ttttagtaaa 1140 

gaaagctgcg tttttactgc tttctcatga aagcatcatc agacacaaat aagtggtatg 1200 

cagcgttacc gtgtcttcga gacaaaaacg catgggcgtt ggctttagag gtttcgaaca 1260 

tatcagcagt gacataagga aggagagtgc tgagataacc ggacaatttc ttttctattt 1320 

catctgttag tgcaaattca atgtcgccga tattcatgat aatcgagaaa acaaagtcga 1380 

tatcgatatg aaaatgttcc tcggcaaaaa ccgcaagctc gtgaattcct ggtgaacatc 1440 

cggcacgctt atggaaaatc tgtttgacta aatcactcac aatccaagca ttgtattgct 1500 

gttctggtga aaagtattgc attagacata cctcctgctc gtacggataa aggcagcgtt 1560 

tcatggtcgt gtgctccgtg cagcggcttc tccttaattt tgatttttct gaaaataggt 1620 

cccgttccta tcactttacc atggacggaa aacaaatagc tactaccatt cctcctgttt 1680 

ttctcttcaa tgttctggaa tctgtttcag gtacagacga tcgggtatga aagaaatata 1740 

gaaaacatga aggaggaata tcgacatgaa accagttgta aaagagtata caaatgacga 1800 

acagctcatg aaagatgtag aggaattgca gaaaatgggt gttgcgaaag aggatgtata 1860 

cgtcttagct cacgacgatg acagaacgga acgcctggct gacaacacga acgccaacac 1920 

gatcggagcc aaagaaacag gtttcaagca cgcggtggga aatatcttca ataaaaaagg 1980 

agacgagctc cgcaataaaa ttcacgaaat cggtttttct gaagatgaag ccgctcaatt 2040 

tgaaaaacgc ttagatgaag gaaaagtgct tctctttgtg acagataacg aaaaagtgaa 2100 
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agcttgggca taaagcaagg aaaaaaccaa aaggccaatg tcggcctttt ggtttttttg 2160 

cggtctttgc ggtgggattt tgcagaatgc cgcaatagga tagcggaaca ttttcggttc 2220 

tgaatgtccc tcaatttgct attatatttt tgtgataaat tggaataaaa tctcacaaaa 2280 

tagaaaatgg gggtacatag tggatgaaaa aagtgatgtt agctacggct ttgtttttag 2340 

gattgactcc agctggcgcg aacgcagctg atttaggcca ccagacgttg ggatccaatg 2400 

atggctgggg cgcgtactcg accggcacga caggcggatc aaaagcatcc tcctcaaatg 2460 

tgtataccgt cagcaacaga aaccagcttg tctcggcatt agggaaggaa acgaacacaa 2520 

cgccaaaaat catttatatc aagggaacga ttgacatgaa cgtggatgac aatctgaagc 2580 

cgcttggcct aaatgactat aaagatccgg agtatgattt ggacaaatat ttgaaagcct 2640 

atgatcctag cacatggggc aaaaaagagc cgtcgggaac acaagaagaa gcgagagcac 2700 

gctctcagaa aaaccaaaaa gcacgggtca tggtggatat ccctgcaaac acgacgatcg 2760 

tcggttcagg gactaacgct aaagtcgtgg gaggaaactt ccaaatcaag agtgataacg 2820 

tcattattcg caacattgaa ttccaggatg cctatgacta ttttccgcaa tggttgtaaa 2880 

acgacggcca gtgaattctg atcaaatggt tcagtgagag cgaagcgaac acttgatttt 2940 

ttaattttct atcttttata ggtcattaga gtatacttat ttgtcctata aactatttag 3000 

cagcataata gatttattga ataggtcatt taagttgagc atattagagg aggaaaatct 3060 

tggagaaata tttgaagaac ccgagatcta gatcaggtac cgcaacgttc gcagatgctg 3120 

ctgaagagat tattaaaaag ctgaaagcaa aaggctatca attggtaact gtatctcagc 3180 

ttgaagaagt gaagaagcag agaggctatt gaataaatga gtagaaagcg ccatatcggc 3240 

gcttttcttt tggaagaaaa tatagggaaa atggtacttg ttaaaaattc ggaatattta 3300 

tacaatatca tatgtatcac attgaaagga ggggcctgct gtccagactg tccgctgtgt 3360 

aaaaataagg aataaagggg ggttgacatt attttactga tatgtataat ataatttgta 3420 

taagaaaatg gaggggccct cgaaacgtaa gatgaaacct tagataaaag tgcttttttt 3480 

gttgcaattg aagaattatt aatgttaagc ttaattaaag ataatatctt tgaattgtaa 3540 

cgcccctcaa aagtaagaac tacaaaaaaa gaatacgtta tatagaaata tgtttgaacc 3600 

ttcttcagat tacaaatata ttcggacgga ctctacctca aatgcttatc taactataga 3660 

atgacataca agcacaacct tgaaaatttg aaaatataac taccaatgaa cttgttcatg 3720 

tgaattatcg ctgtatttaa ttttctcaat tcaatatata atatgccaat acattgttac 3780 

aagtagaaat taagacaccc ttgatagcct tactatacct aacatgatgt agtattaaat 3840 

gaatatgtaa atatatttat gataagaagc gacttattta taatcattac atatttttct 3900 

attggaatga ttaagattcc aatagaatag tgtataaatt atttatcttg aaaggaggga 3960 

tgcctaaaaa cgaagaacat taaaaacata tatttgcacc gtctaatgga tttatgaaaa 4020 

atcattttat cagtttgaaa attatgtatt atggagctct gaaaaaaagg agaggataaa 4080 

gagaaaaggg gatcggaaaa caagtatata ggaggagacc tatttatggc ttcagaaaaa 4140 
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gacgcaggaa aacagtcagc agtaaagctt gttccattgc ttattactgt cgctgtggga 4200 

ctaatcatct ggtttattcc cgctccgtcc ggacttgaac ctaaagcttg gcatttgttt 4260 

gcgatttttg tcgcaacaat tatcggcttt atctccaagc ccttgccaat gggtgcaatt 4320 

gcaatttttg cattggcggt tactgcacta actggaacac tatcaattga ggatacatta 4380 

agcggattcg ggaataagac catttggctt atcgttatcg cattctttat ttcccgggga 4440 

tttatcaaaa ccggtctcgg tgcgagaatt tcgtatgtat tcgttcagaa attcggaaaa 4500 

aaaacccttg gactttctta ttcactgcta ttcagtgatt taatactttc acctgctatt 4560 

ccaagtaata cggcgcgtgc aggaggcatt atatttccta ttatcagatc attatccgaa 4620 

acattcggat caagcccggc aaatggaaca gagagaaaaa tcggtgcatt cttattaaaa 4680 

accggttttc aggggaatct gatcacatct gctatgttcc tgacagcgat ggcggcgaac 4740 

ccgctgattg ccaagctggc ccatgatgtc gcaggggtgg acttaacatg gacaagctgg 4800 

gcaattgccg cgattgtacc gggacttgta agcttaatca tcacgccgct tgtgatttac 4860 

aaactgtatc cgccggaaat caaagaaaca ccggatgcgg cgaaaatcgc aacagaaaaa 4920 

ctgaaagaaa tgggaccgtt caaaaaatcg gagctttcca tggttatcgt gtttcttttg 4980 

gtgcttgtgc tgtggatttt tggcggcagc ttcaacatcg acgctaccac aaccgcattg 5040 

atcggtttgg ccgttctctt attatcacaa gttctgactt gggatgatat caagaaagaa 5100 

cagggcgctt gggatacgct cacttggttt gcggcgcttg tcatgctcgc caacttcttg 5160 

aatgaattag gcatggtgtc ttggttcagt aatgccatga aatcatccgt atcagggttc 5220 

tcttggattg tggcattcat cattttaatt gttgtgtatt attactctca ctatttcttt 5280 

gcaagtgcga cagcccacat cagtgcgatg tattcagcat ttttggctgt cgtcgtggca 5340 

gcgggcgcac cgccgctttt agcagcgctg agcctcgcgt tcatcagcaa cctgttcggg 5400 

tcaacgactc actacggttc tggagcggct ccggtcttct tcggagcagg ctacatcccg 5460 

caaggcaaat ggtggtccat cggatttatc ctgtcgattg ttcatatcat cgtatggctt 5520 

gtgatcggcg gattatggtg gaaagtacta ggaatatggt agaaagaaaa aggcagacgc 5580 

ggtctgcctt tttttatttt cactccttcg taagaaaatg gattttgaaa aatgagaaaa 5640 

ttccctgtga aaaatggtat gatctaggta gaaaggacgg ctggtgctgt ggtgaaaaag 5700 

cggttccatt tttccctg 5718 

<210> 23 
<211> 27 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 1605 
<400> 23 

gacggccagt gaattcgata aaagtgc 27 
<210> 24 
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<211> 42 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 1606 
<220> 

<221> miscfeature 

<222> (13).. (13) 

<223> n is a, c, g, or t 

<220> 

<221> raisc_feature 

<222> (16) . . (16) 

<223> n is a, c, g, or t 

<400> 24 

ccagatctct atnktnktgt acggagtcta actccccaag ag 42 

<210> 25 
<211> 1112 
<212> DNA 

<213> Nocardiopsis dassonvillei DSM 43235 
<400> 25 

gcttttagtt catcgatcgc atcggctgct ccggcccccg tcccccagac ccccgtcgcc 60 

gacgacagcg ccgccagcat gaccgaggcg ctcaagcgcg acctcgacct cacctcggcc 120 

gaggccgagg agcttctctc ggcgcaggaa gccgccatcg agaccgacgc cgaggccacc 180 

gaggccgcgg gcgaggccta cggcggctca ctgttcgaca ccgagaccct cgaactcacc 240 

gtgctggtca ccgacgcctc cgccgtcgag gcggtcgagg ccaccggagc ccaggccacc 300 

gtcgtctccc acggcaccga gggcctgacc gaggtcgtgg aggacctcaa cggcgccgag 360 

gttcccgaga gcgtcctcgg ctggtacccg gacgtggaga gcgacaccgt cgtggtcgag 420 

gtgctggagg gctccgacgc cgacgtcgcc gccctgctcg ccgacgccgg tgtggactcc 480 

tcctcggtcc gggtggagga ggccgaggag gccccgcagg tctacgccga catcatcggc 540 

ggcctggcct actacatggg cggccgctgc tccgtcggct tcgccgcgac caacagcgcc 600 

ggtcagcccg gtttcgtcac cgccggccac tgcggcaccg tcggcaccgg cgtgaccatc 660 

ggcaacggca ccggcacctt ccagaactcg gtcttccccg gcaacgacgc cgccttcgtc 720 

cgcggcacct ccaacttcac cctgaccaac ctggtctcgc gctacaactc cggcggctac 780 

cagtcggtga ccggtaccag ccaggccccg gccggctcgg ccgtgtgccg ctccggctcc 840 

accaccggct ggcactgcgg caccatccag gcccgcaacc agaccgtgcg ctacccgcag 900 

ggcaccgtct actcgctcac ccgcaccaac gtgtgcgccg agcccggcga ctccggcggt 960 

tcgttcatct ccggctcgca ggcccagggc gtcacctccg gcggctccgg caactgctcc 1020 

gtcggcggca cgacctacta ccaggaggtc accccgatga tcaactcctg gggtgtcagg 1080 

atccggacct aatcgcatgt tcaatccgct cc 1112 

<210> 26 
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<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 1423 

<400> 26 

gcttttagtt catcgatcgc atcggctgct ccggcccccg tcccccag 

<210> 27 

<211> 45 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 1475 

<400> 27 

ggagcggatt gaacatgcga ttaggtccgg atcctgacac cccag 

<210> 28 

<211> 354 

<212> prt 

<213> Nocardiopsis dassonvillei DSM 43235 

<400> 28 

Ala Pro Ala Pro val pro Gin Thr Pro val Ala Asp Asp Ser Ala Ala 
A 5 10 15 

Ser Met Thr Glu Ala Leu Lys Arg Asp Leu Asp Leu Thr Ser Ala Glu 
*° 25 30 

Ala Glu Glu Leu Leu ser Ala Gin Glu Ala Ala lie Glu Thr Asp Ala 
35 40 45 

Glu Ala Thr Glu Ala Ala Gly Glu Ala Tyr Gly Gly Ser Leu Phe Asp 
3U 55 60 

Thr Glu Thr Leu Glu Leu Thr Val Leu val Thr Asp Ala ser Ala val 

75 80 

Glu Ala val Glu Ala Thr Gly Ala Gin Ala Thr val val ser His Glv 
85 90 95 

Thr Glu Gly Leu Thr Glu Val val Glu Asp Leu Asn Gly Ala Glu Val 
100 105 110 

pro Glu ser Val Leu Gly Trp Tyr Pro Asp val Glu Ser Asp Thr val 

val val Glu val Leu Glu Gly ser Asp Ala Asp val Ala Ala Leu Leu 

135 140 

Al| Asp Ala Gly val Asg Ser ser ser val Arg val Glu Glu Ala Glu 
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Glu Ala Pro Gin Val Tyr Ala Asp lie lie Gly Gly Leu Ala Tyr Tyr 
165 170 175 

Met Gly Gly Arg cys Ser Val Gly Phe Ala Ala Thr Asn ser Ala Gly 
180 185 190 

Gin Pro Gly Phe val Thr Ala Gly His cys Gly Thr val Gly Thr Glv 
195 200 205 

Val Tift 11 e Gly Asn Gly Thr Gly Thr Phe Gin Asn ser val Phe pro 
'10 215 220 

Gly Asn Asp Ala Ala Phe Val Arg Gly Thr ser Asn Phe Thr Leu Thr 
" 5 230 235 240 

Asn Leu Val ser Arg Tyr Asn ser Gly Gly Tyr Gin ser val Thr Gly 
245 250 255 

Thr ser Gin Ala Pro Ala Gly ser Ala val Cys Arg ser Gly ser Thr 
260 265 " 270 

Thr Gly Trp His Cys Gly Thr lie Gin Ala Arg Asn Gin Thr val Arg 
275 280 285 

Tyr ?Sn Gln Gly Tnr Va1 TX r Ser Leo Thr Ar 9 Thr Asn val cys Ala 

295 300 

Glu Pro Gly Asp Ser Gly Gly ser Phe lie Ser Gly Ser Gin Ala Gin 
305 310 315 320 

Gly val Thr ser Gly Gly ser Gly Asn Cys Ser Val Gly Gly Thr Thr 
325 330 335 

Tyr Tyr Gin Glu Val Thr Pro Met lie Asn ser Trp Gly Val Arg lie 
340 345 3 50 

Arg Thr 



<210> 29 
<211> 498 
<212> DNA 

<213> Nocardiopsis dassonvillei DSM 43235 
<400> 29 

gctccggccc ccgtccccca gacccccgtc gccgacgaca gcgccgccag catgaccgag 60 
gcgctcaagc gcgacctcga cctcacctcg gccgaggccg aggagcttct ctcggcgcag 120 
gaagccgcca tcgagaccga cgccgaggcc accgaggccg cgggcgaggc ctacggcggc 180 
tcactgttcg acaccgagac cctcgaactc accgtgctgg tcaccgacgc ctccgccgtc 240 
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gaggcggtcg aggccaccgg agcccaggcc accgtcgtct cccacggcac cgagggcctg 300 

accgaggtcg tggaggacct caacggcgcc gaggttcccg agagcgtcct cggctggtac 360 

ccggacgtgg agagcgacac cgtcgtggtc gaggtgctgg agggctccga cgccgacgtc 420 

gccgccctgc tcgccgacgc cggtgtggac tcctcctcgg tccgggtgga ggaggccgag 480 

gaggccccgc aggtctac 498 

<210> 30 
<211> 166 
<212> PRT 

<213> Nocardiopsls dassonvillei dsm 43235 
<400> 30 

Ala Pro Ala pro val Pro Gin Thr Pro val Ala Asp Asp ser Ala Ala 

10 15 

Ser Met Thr Glu Ala Leu Lys Arg Asp Leu Asp Leu Thr ser Ala Glu 
^° 25 30 

Ala Glu Glu Leu Leu Ser Ala Gin Glu Ala Ala lie Glu Thr Asp Ala 
3* 40 45 

Glu Ala Thr Glu Ala Ala Gly Glu Ala Tyr Gly Gly ser Leu Phe Asp 
JU 55 60 

Thr Glu Thr Leu Glu Leu Thr val Leu Val Thr Asp Ala Ser Ala val 

70 75 80 

Glu Ala val Glu Ala Thr Gly Ala Gin Ala Thr val val ser His Gly 
85 90 g5 ' 

Thr Glu Gly Leu Thr Glu val val Glu Asp Leu Asn Gly Ala Glu Val 
100 105 110 

Pro Glu ser val Leu Gly Trp Tyr Pro Asp val Glu ser Asp Thr val 

120 125 

val val Glu val Leu Glu Gl£ ser Asp Ala Asp val Ala Ala Leu Leu 

Ala Asp Ala Gly val Asp ser ser ser val Arg Val Glu Glu Ala Glu 
AH3 150 155 160 

Glu Ala Pro Gin val Tyr 
165 

<210> 31 
<211> 1146 
<212> DNA 

<213> Artificial sequence 
<220> 
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<U5> The dna sequence coding for the pro-region of SEQ ID NO: 29 fused 

<400> 31 

atgaagaaac cgttggggaa aattgtcgca agcaccgcac tactcatttc tgttgctttt 60 

agttcatcga tcgcatcggc tgctccggcc cccgtccccc agacccccgt cgccgacgac 120 

agcgccgcca gcatgaccga ggcgctcaag cgcgacctcg acctcacctc ggccgaggcc 180 

gaggagcttc tctcggcgca ggaagccgcc atcgagaccg acgccgaggc caccgaggcc 240 

gcgggcgagg cctacggcgg ctcactgttc gacaccgaga ccctcgaact caccgtgctg 300 

gtcaccgacg cctccgccgt cgaggcggtc gaggccaccg gagcccaggc caccgtcgtc 360 

tcccacggca ccgagggcct gaccgaggtc gtggaggacc tcaacggcgc cgaggttccc 420 

gagagcgtcc tcggctggta cccggacgtg gagagcgaca ccgtcgtggt cgaggtgctg 480 

gagggctccg acgccgacgt cgccgccctg ctcgccgacg ccggtgtgga ctcctcctcg 540 

gtccgggtgg aggaggccga ggaggccccg caggtctatg ccgatatcat tggaggccta 600 

gcgtacacaa tgggtggtcg ctgcagcgta ggatttgcag ccacaaatgc agctggacaa 660 

cctggcttcg tgacagctgg acattgcggc cgcgtcggta cacaggttac tatcggcaat 720 

ggaagaggtg tctttgagca aagcgtattt cccgggaatg atgctgcctt cgttagaggt 780 

acgtccaact ttacgcttac taacttagta tctagataca acactggcgg atatgcaact 840 

gtagcaggtc acaatcaagc acctattggc tctagcgtct gccgctcagg gtcgactaca 900 

ggatggcatt gtggaaccat tcaagctaga ggtcagagcg tgagctatcc tgaaggtacc 960 

gtaacgaaca tgactcgtac gactgtatgt gcagaaccag gtgactctgg aggttcatat 1020 

atcagcggta cgcaagcgca aggcgttacc tcaggtggat ccggtaactg taggacaggt 1080 

ggcacaacgt tctaccagga agtgacaccg atggtgaact cttggggagt tagactccgt 1140 

acataa m6 

<210> 32 
<211> 1068 
<212> DNA 

<213> Nocardiopsis Alba dsm 15647 
<400> 32 

gcgaccggcc ccctccccca gtcccccacc ccggatgaag ccgaggccac caccatggtc 60 

gaggccctcc agcgcgacct cggcctgtcc ccctctcagg ccgacgagct cctcgaggcg 120 

caggccgagt ccttcgagat cgacgaggcc gccaccgcgg ccgcagccga ctcctacggc 180 

ggctccatct tcgacaccga cagcctcacc ctgaccgtcc tggtcaccga cgcctccgcc 240 

gtcgaggcgg tcgaggccgc cggcgccgag gccaaggtgg tctcgcacgg catggagggc 300 

ctggaggaga tcgtcgccga cctgaacgcg gccgacgctc agcccggcgt cgtgggctgg 360 

taccccgaca tccactccga cacggtcgtc ctcgaggtcc tcgagggctc cggtgccgac 420 

gtggactccc tgctcgccga cgccggtgtg gacaccgccg acgtcaaggt ggagagcacc 480 
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accgagcagc ccgagctgta cgccgacatc atcggcggtc tcgcctacac catgggtggg 540 

cgctgctcgg tcggcttcgc ggccaccaac gcctccggcc agcccgggtt cgtcaccgcc 600 

ggccactgcg gcaccgtcgg caccccggtc agcatcggca acggccaggg cgtcttcgag 660 

cgttccgtct tccccggcaa cgactccgcc ttcgtccgcg gcacctcgaa cttcaccctg 720 

accaacctgg tcagccgcta caacaccggt ggttacgcga ccgtctccgg ctcctcgcag 780 

gcggcgatcg gctcgcagat ctgccgttcc ggctccacca ccggctggca ctgcggcacc 840 

gtccaggccc gcggccagac ggtgagctac ccccagggca ccgtgcagaa cctgacccgc 900 

accaacgtct gcgccgagcc cggtgactcc ggcggctcct tcatctccgg cagccaggcc 960 

cagggcgtca cctccggtgg ctccggcaac tgctccttcg gtggcaccac ctactaccag 1020 

gaggtcaacc cgatgctgag cagctggggt ctgaccctgc gcacctga 1068 

<210> 33 
<211> 355 
<212> prt 

<213> Nocardiopsis Alba DSM 15647 
<400> 33 

Ala Thr Gly Pro Leu Pro Gin ser pro Thr Pro Asp Glu Ala Glu Ala 

10 15 

Thr Thr Met val Glu Ala Leu Gin Arg Asp Leu Gly Leu Ser Pro ser 
20 25 30 

Gin Ala Asp Glu Leu Leu Glu Ala Gin Ala Glu ser Phe Glu lie Asp 
J -> 40 45 

Glu Ala Ala Thr Ala Ala Ala Ala Asp ser Tyr Gly Gly Ser He Phe 
au 55 60 

Asp Thr Asp Ser Leu Thr Leu Thr val Leu Val Thr Asp Ala Ser Ala 

75 30 

val Glu Ala val Glu Ala Ala Gly Ala Glu Ala Lys Val Val Ser His 
85 go 95 

Gly Met Glu Gly Leu Glu Glu He Val Ala Asp Leu Asn Ala Ala Asp 
AUU 105 110 

Ala Gin pro Gly val val Gly Trp Tyr Pro Asp He His Ser Asp Thr 
■ LA:> 120 125 

val Val Leu Glu val Leu Glu Gly ser Gly Ala Asp val Asp Ser Leu 

135 140 

Leu Ala Asp Ala Gly val Asp Thr Ala Asp val Lys val Glu Sep Thr 
J 150 155 160 



Page 18 



„ „ _ 10495.000-DK.ST25.txt 
Thr Glu Gin Pro G]u Leu Tyr Ala Asp lie He Gly Gly Leu Ala Tyr 
165 170 175 

Thr Met Gly Gly Arg cys ser Val Gly Phe Ala Ala Thr Asn Ala Ser 
180 185 190 

Gly Gin Pro Gly phe Val Thr Ala Gly His cys Gly Thr Val Gly Thr 
1^5 200 205 

Pro val ser lie Gly Asn Gly Gin Gly Val Phe Glu Arg ser val Phe 
"° 215 220 

Pro Gly Asn Asp Ser Ala Phe Val Arg Gly Thr Ser Asn Phe Thr Leu 
*<* 230 235 240 

Thr Asn Leu val Ser Arg Tyr Asn Thr Gly Gly Tyr Ala Thr val Ser 
245 250 255 

Gly ser ser Gin Ala Ala He Gly ser Gin He cys Arg Ser Gly Ser 
260 265 270 

Thr Thr Gly Trp His cys Gly Thr val Gin Ala Arg Gly Gin Thr Val 
275 280 285 

ser Tyr Pro Gin Gly Thr val Gin Asn Leu Thr Arg Thr Asn val Cys 
290 295 300 

Ala Glu Pro Gly Asp ser Gly Gly ser Phe He Ser Gly ser Gin Ala 
sua 310 315 320 

Gin Gly val Thr ser Gly Gly ser Gly Asn Cys Ser Phe Gly Gly Thr 
325 330 335 

Thr Tyr Tyr Gin Glu val Asn pro Met Leu Ser Ser Trp Gly Leu Thr 
340 345 350 



Leu Arg Thr 
355 



<210> 34 

<211> 43 

<212> DMA 

<213> Artificial sequence 
<220> 

<223> Primer 1421 

<400> 34 

gttcatcgat cgcatcggct gcgaccggcc ccctccccca gtc 

<210> 35 

<211> 31 

<212> DNA 

<213> Artificial sequence 
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<220> 

<223> Primer 1604 



<400> 35 

gcggatccta tcaggtgcgc agggtcagac c 31 

<210> 36 
<211> 1062 
<212> DNA 

<213> Nocardiopsis prasina DSM 15648 
<400> 36 

gccaccggac cgctccccca gtcacccacc ccggaggccg acgccgtctc catgcaggag 60 

gcgctccagc gcgacctcgg cctgaccccg cttgaggccg atgaactgct ggccgcccag 120 

gacaccgcct tcgaggtcga cgaggccgcg gccgcggccg ccggggacgc ctacggcggc 180 

tccgtcttcg acaccgagac cctggaactg accgtcctgg tcaccgacgc cgcctcggtc 240 

gaggctgtgg aggccaccgg cgcgggtacc gaactcgtct cctacggcat cgagggcctc 300 

gacgagatca tccaggatct caacgccgcc gacgccgtcc ccggcgtggt cggctggtac 360 

ccggacgtgg cgggtgacac cgtcgtcctg gaggtcctgg agggttccgg agccgacgtg 420 

agcggcctgc tcgccgacgc cggcgtggac gcctcggccg tcgaggtgac cagcagtgcg 480 

cagcccgagc tctacgccga catcatcggc ggtctggcct acaccatggg cggccgctgt 540 

tcggtcggat tcgcggccac caacgccgcc ggtcagcccg gattcgtcac cgccggtcac 600 

tgtggccgcg tgggcaccca ggtgagcatc ggcaacggcc agggcgtctt cgagcagtcc 660 

atcttcccgg gcaacgacgc cgccttcgtc cgcggcacgt ccaacttcac gctgaccaac 720 

ctggtcagcc gctacaacac cggcggttac gccaccgtcg ccggccacaa ccaggcgccc 780 

atcggctcct ccgtctgccg ctccggctcc accaccggct ggcactgcgg caccatccag 840 

gcccgcggcc agtcggtgag ctaccccgag ggcaccgtca ccaacatgac ccggaccacc 900 

gtgtgcgccg agcccggcga ctccggcggc tcctacatct ccggcaacca ggcccagggc 960 

gtcacctccg gcggctccgg caactgccgc accggcggga ccaccttcta ccaggaggtc 1020 

acccccatgg tgaactcctg gggcgtccgt ctccggacct aa 1062 

<210> 37 

<211> 353 

<212> PRT 

<213> Nocardiopsis prasina dsm 15648 

<400> 37 

Ala Thr Gly pro Leu Pro Gin Ser Pro Thr Pro Glu Ala Asp Ala val 
1 5 10 *is 

ser Met Gin Glu Ala Leu Gin Arg Asp Leu Gly Leu Thr pro Leu Glu 
20 25 30 

Ala Asp Glu Leu Leu Ala Ala Gin Asp Thr Ala Phe Glu val Asp Glu 
35 40 45 
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Ala Ala Ala Ala Ala Ala Gly Asp Ala Tyr Gly Gly ser Val Phe asd 
JW 55 60 

Thr Glu Thr Leu Glu Leu Thr val Leu Val Thr Asp Ala Ala ser Val 

/0 75 80 

Glu Ala val Glu Ala Thr Gly Ala Gly Thr Glu Leu val ser Tyr Glv 
85 90 95 

He Glu Gly Leu Asp Glu He He Gin Asp Leu Asn Ala Ala Asp Ala 
AUU 105 110 

val Pro Gly val Val Gly Trp Tyr Pro Asp val Ala Gly Asp Thr val 
val Leu Glu val Leu Glu Gly ser Gly Ala Asp Val ser Gly Leu Leu 

u 2.35 140 

Ala Asp Ala Gly val Asp Ala ser Ala Val Glu Val Thr ser Ser Ala 

A:>u 155 160 

Gin Pro Glu Leu Tyr Ala Asp He He Gly Gly Leu Ala Tyr Thr Met 

Gly Gly Arg Cys Ser Val Gly Phe Ala Ala Thr Asn Ala Ala Gly Gin 

Pro Gly Phe val Thr Ala Gly His cys Gly Arg val Gly Thr Gin val 

200 205 

ser lie Gly Asn Gly Gin Gly val Phe Glu Gin Ser He Phe Pro Gly 

Asn Asp Ala Ala Phe Val Arg Gly Thr Ser Asn Phe Thr Leu Thr Asn 

" u 235 240 

Leu val ser Arg Tyr Asn Thr Gly Gly Tyr Ala Thr val Ala Gly His 

250 255 

Asn Gin Ala Pro He Gly ser ser val Cys Arg ser Gly ser Thr Thr 
* DU 265 270 

Gly Trp His cys Gly Thr He Gin Ala Arg Gly Gin ser val ser Tyr 
' 3 280 285 

pro Glu Gly Thr Val Thr Asn Met Thr Arg Thr Thr val cys Ala Glu 

300 

Pro Gly Asp ser Gly G jy S er Tyr He ser Gly Asn Gin Ala Gin Gl£ 
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val Thr ser Gly Gly Ser Gly Asn cys Arg Thr Gly Gly Thr Thr Phe 
325 330 335 

Tyr Gin Glu val Thr Pro Met val Asn ser Trp Gly val Arg Leu Arg 
Thr 



<210> 38 

<211> 43 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 1346 

<400> 38 

gttcatcgat cgcatcggct gccaccggac cgctccccca gtc 43 

<210> 39 

<211> 38 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 1602 

<400> 39 

gcggatccta ttaggtccgg agacggacgc cccaggag 38 

<210> 40 
<211> 1062 
<212> DNA 

<213> Nocardiopsis prasina DSM 15649 
<400> 40 

gccaccggac cactccccca gtcacccacc ccggaggccg acgccgtctc catgcaggag 60 

gcgctccagc gcgacctcgg cctgaccccg cttgaggccg atgaactgct ggccgcccag 120 

gacaccgcct tcgaggtcga cgaggccgcg gccgaggccg ccggtgacgc ctacggcggc 180 

tccgtcttcg acaccgagac cctggaactg accgtcctgg tcaccgactc cgccgcggtc 240 

gaggcggtgg aggccaccgg cgccgggacc gaactggtct cctacggcat cacgggcctc 300 

gacgagatcg tcgaggagct caacgccgcc gacgccgttc ccggcgtggt cggctggtac 360 

ccggacgtcg cgggtgacac cgtcgtgctg gaggtcctgg agggttccgg cgccgacgtg 420 

ggcggcctgc tcgccgacgc cggcgtggac gcctcggcgg tcgaggtgac caccaccgag 480 

cagcccgagc tgtacgccga catcatcggc ggtctggcct acaccatggg cggccgctgt 540 

tcggtcggct tcgcggccac caacgccgcc ggtcagcccg ggttcgtcac cgccggtcac 600 

tgtggccgcg tgggcaccca ggtgaccatc ggcaacggcc ggggcgtctt cgagcagtcc 660 

atcttcccgg gcaacgacgc cgccttcgtc cgcggaacgt ccaacttcac gctgaccaac 720 
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10495.000-DK.ST25.txt 
ctggtcagcc gctacaacac cggcggctac gccaccgtcg ccggtcacaa ccaggcgccc 780 

atcggctcct ccgtctgccg ctccggctcc accaccggtt ggcactgcgg caccatccag 840 

gcccgcggcc agtcggtgag ctaccccgag ggcaccgtca ccaacatgac gcggaccacc 900 

gtgtgcgccg agcccggcga ctccggcggc tcctacatct ccggcaacca ggcccagggc 960 

gtcacctccg gcggctccgg caactgccgc accggcggga ccaccttcta ccaggaggtc 1020 

acccccatgg tgaactcctg gggcgtccgt ctccggacct aa 1062 

<210> 41 

<211> 353 

<212> PRT 

<213> Nocardiopsis prasina dsm 15649 

<400> 41 

Ala Thr Gly Pro Leu Pro Gin Ser pro Thr Pro Glu Ala Asp Ala Val 
A 5 10 15 

ser Met Gin Glu Ala Leu Gin Arg Asp Leu Gly Leu Thr Pro Leu Glu 
20 25 30 

Ala Asp Glu Leu Leu Ala Ala Gin Asp Thr Ala Phe Glu val Asp Glu 
35 40 45 

Ala Ala Ala Glu Ala Ala Gly Asp Ala Tyr Gly Gly Ser val Phe Asp 
5U 55 60 

Thr Glu Thr Leu Glu Leu Thr val Leu val Thr Asp ser Ala Ala Val 
03 70 75 80 

Glu Ala Val Glu Ala Thr Gly Ala Gly Thr Glu Leu val Ser Tyr Gly 
85 90 95 

He Thr Gly Leu Asp Glu He val Glu Glu Leu Asn Ala Ala Asp Ala 
100 105 110 

val Pro Gly Val Val Gly Trp Tyr Pro Asp val Ala Gly Asp Thr val 
1:L 5 120 125 

Val Leu Glu val Leu Glu Gly ser Gly Ala Asp val Gly Gly Leu Leu 
130 135 140 

Ala Asp Ala Gly val Asp Ala Ser Ala val Glu Val Thr Thr Thr Glu 

150 155 160 

Gin Pro Glu Leu Tyr Ala Asp lie lie Gly Gly Leu Ala Tyr Thr Met 
15 5 170 175 

Gly Gly Arg Cys ser val Gly Phe Ala Ala Thr Asn Ala Ala Gly Gin 
180 185 190 
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, , , 1049S.000-DK.ST25. 
pro Gly Phe Val Thr Ala Gly His cys Gly Arg val 
195 200 



txt 

Gly Thr Gin val 
205 



Thr lie Gly Asn Gly Arg Gly val Phe Glu Gin Ser lie Phe Pro Gl 
210 215 220 



Asn Asp Ala Ala Phe Val Arg Gly Thr Ser Asn Phe Thr Leu Thr Asn 
225 230 235 240 



Leu val ser Arg Tyr Asn Thr Gly Gly Tyr Ala Thr val Ala Gly His 
245 250 255 



Asn Gin Ala Pro lie Gly ser ser val cys Arg ser Gly Ser Thr Thr 
260 265 270 

Gly Trp His cys Gly Thr lie Gin Ala Arg Gly Gin ser val ser Tyr 
275 280 285 



Pro Glu Gly Thr Val Thr Asn Met Thr Arg Thr Thr val Cys Ala Glu 
290 295 " 300 



Pro Gly Asp ser Gly Gly Ser Tyr He Ser Gly Asn Gin Ala Gin Gly 
305 310 315 320 



Val Thr ser Gly Gly Ser Gly Asn cys Arg Thr Gly Gly Thr Thr Phe 
325 330 335 



Tyr Gin Glu val Thr Pro Met val Asn Ser Trp Gly Val Arg Leu Arg 
340 345 350 



Thr 



<210> 42 

<211> 43 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 1603 

<400> 42 

gttcatcgat cgcatcggct gccaccggac cactccccca gtc 



Page 24 



